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PREFACE 


The various chapters of this book were originally 
presented &s papers &t & conference on psychological 
scaling held at Princeton, New Jersey, in May 1958. 

Since the war there had been & number of interesting 
theoretical developments in the scaling field and many 
applications of scaling techniques in widely differing 
contexts. Some research workers, for example, were con- 
centrating on measurement of sensory magnitudes, others 
primarily on choice behavior, and others on measurement 
of attitudes. It seemed desirable to bring together 
investigators in these various fields to consider the 
present status and possible next steps in developing the 
theory and applications of psychological scaling methods. 
There was a wide general interest and enthusiasm shown 
for a conference on this topic. 

Such a conference was also thought appropriate to mark 
the tenth anniversary of the Psychometric Fellowship 
Program, which was inaugurated through the cooperation of 
Educational Testing Service and Princeton University. In 
the spring of 1948 arrangements were completed to admit 
Bert Green and Warren Torgerson as the first Psychometric 
Fellows. Since then about 25 Fellows have been admitted 
to the program. We have found that students who have 
been encouraged to study mathematics during their under- 
graduate years bring a valuable preparation to their 
graduate study of psychology and psychometrics. Programs 
emphasizing the combination of mathematics and psychology 
are now under way in various universities. Extension and 
continuation of such training should contribute to a 
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Significant acceleration of the development of psychology 
&S & quantitative rational science. 

Although the conference was broadly conceived to cover 
& wide range of psychological scaling theory and applica- 
tions, it was organized around five general topics to 
provide a context for discussion. The first session, 
covering Chapters 2 to 5 of the present volume, dealt 
with some of the properties of category scales and quanti- 
tative estimation scales, and their implications for the 
nature of psychological judgments under various conditions. 
In Chapter 2, Lyle Jones of the University of North Caro- 
lin& presents some invariant empirical findings obtained 
With the method of successive intervals. A comparison 
of category and magnitude estimation Scales of lightness 
and of darkness is given in Chapter 5 by Warren Torgerson 
of the Massachusetts Institute of Technology, and in 
Chapter 4, Roger Shepard of the Bell Telephone Laboratories 
presents & treatment of stimulus similarity in terms of 
psychological distance models. In Chapter 5, Bert Green 
of the Massachusetts Institute of Technology discusses & 
critical property of the method of successive categories 
using interval means. 

The second session of the conference centered around 
problems in psychophysical scaling. In Chapter 6, S. 
Smith Stevens of Harvard University compares properties 
of ratio, category, and confusion scales. In Chapter 7, 
William McGill of Columbia University discusses indi- 
vidual differences in the judgment of loudness and the 
problems such differences introduce in determining the 
Slope of & loudness function. 

At the third session of the conference, discussion of 
Scaling in the context of attitude measurement was given 
by Paul Lazarsfeld of Columbia University (Chapter 8). 
Similar problems of measurement along a latent continuum 
of true scores was presented in the context of ability 
testing by Frederic Lord of Edueational Testing Service 
(Chapter 9). 

The fourth session of the conference, covering the 
topie of choice and the measurement of utility, focussed 
attention on a relatively new approach to the measurement 
of subjective value that has been gradually developing 
among economists working in collaboration with mathe- 
maticians and, more recently, with psychologists. This 
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&pproach is marked by & formal axiomatic explication of 
models and of measurement properties, as well as the use 
of game theory and decision-making situations for empir- 
ical verification of hypotheses. In Chapter 10, Ward Ed- 
wards of the University of Michigan discusses additive 
and non-additive models for maximization of subjectively 
expected utility and the implications of these models for 
the measurement of value and of subjective probability. 
In Chapter 11, Sidney Siegel of Pennsylvania State Uni- 
versity presents an axiomatization of higher ordered 
metric scaling and discusses the predictive efficiency of 
the model and its application in & decision situation. 

In the same context, R. Duncan Luce of the University of 
Pennsylvania developed some empirical consequences of & 
choice axiom and discussed some experimental findings in 
support of his theory. Dr. Luce's material has not been 
included in the present volume, since he felt that a much 
more extensive and adequate treatment of his axiom and 
its consequences was available in Luce (1959a). 

The last session, covering Chapters 12 to 14, dealt 
with various aspects of multidimensional scaling. In 
Chapter 12, Clyde Coombs of the University of Michigan 
compares his multidimensional unfolding technique with 
factor analysis as a method of recovering both indi- 
yidual preferences and & social utility scale. In Chap- 
ter 13, Ledyard Tucker of Educational Testing Service 
distinguishes between multidimensionality of a perceived 
stimulus domain and multidimensionality due to individual 
differences in affective responses; he then discusses a 
multidimensional vector model for the latter case. 
Finally, in Chapter 14, Robert Abelson of Yale University 
presents a generalization of discriminant analysis for 
deriving scales from variance components in multi-way 
tables. 

From the foregoing review it can be seen that this 
volume represents a broad survey of the field of scaling. 
We have discussions of yarious scaling methods, such as 
category scales, quantitative estimation scales, response 
scales, and multidimensional scales. Various applications 
of scaling are also illustrated by the measurement of 
utility, sensory magnitudes, mental abilities, and atti- 
tudes. It is hoped that pringing material of this range 
together in one volume will help to emphasize the scope 
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and generality of the scaling methods and also highlight 
the similarities and differences encountered in applying 
scaling methods in different areas. 

For those interested in further study of the field of 
psychological scaling methods, we may note that a bibli- 
ography is included in this volume. A guide to general 
reading in this area may be found in the following refer- 
ences: Guilford (1954), Gulliksen (1959), Lorà (195h), 
Messick and Abelson (1957), Thurstone (1959), and Torger- 
son (1958). 

Special thanks and &ppreciation are due to Educational 
Testing Service, the Office of Naval Research, and Prince- 
ton University for their encouragement in Sponsoring this 
conference and for their various contributions which made 
it possible to hold the meetings. 

Thanks are also due to the chairmen and discussants at 
the various sessions of the conference, whose remarks 
helped to stimulate an active and rewarding interchange 
of viewpoints: Dr. Herbert Solomon of Columbia University; 
Dr. Bert Green of Massachusetts Institute of Technology, 
Dr. Eugene Galanter of the University of Pennsylvania, 
and Drs. Oskar Morgenstern, Carroll Pratt, Frederick 
Stephan, &nd Samuel Wilks of Princeton University. The 
editors also gratefully &cknowledge the assistance of 
Jean Herman, Ann King, Sally Matlack, Lois Righter, and 
Sharon Wolozin in preparing the manuscript for publica- 
tion. 

Reproduction, translation, publication, use and dis- 
posal in whole or in part by or for the United States 
Government is permitted, except for the following previ- 
ously copyrighted material: Chapter ?, Some Invariant 
Findings Under the Method of Successive Intervals, 
pages 7-20; and Table 9-1 from Chapter 9, Inferring the 
Examinee's True Score, page 105. Permission to repro- 
duce these materials must, of course, be obtained from 
the original publishers. 


Harold Gulliksen 
Princeton, N. J., 1959 Samuel Messick 
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Introduction 


and Historical Background 


Since the middle of the twentieth century there has 
been & marked growth of interest in using mathematics to 
aid in stating and testing psychological theories. This 
growth has been particularly striking in psychological 
testing, in learning, anà in psychological scaling. This 
book presents one aspect of this development--psycho- 
logical scaling methods. These may be generally charac- 
terized as methods for dealing in & quantitative fashion 
with qualitative subjective judgments. 

Here I shall emulete Stephen Leacock's professor and 
ack a few hundred years in order to get a running 
start. For several centuries philosophers talked &bout 
the mind-body problem--Just &5 everyone talks &bout the 
weather--but nobody did anything &bout it. At last, 

+ of the nineteenth century, several 
decided to do something about it 
uantitative approach to this problem. 
sted that the sensation inten- 


sity was prop 


intensity. Stevens (19570) has recently called attention 


to Plateau's prior suggestion of a power function. 
L. L. Thurstone (19272) pointed out that there were 


two classes of psychophysical methods. One class, such 
as the method of &verage error, or the method of minimal 
changes, required that the experimenter be able to obtain 
some physical measurement of the stimulus, and to control 
purposes of his experiment. A 
second class of experimental methods in psychophysics, 
such as the method of paired comparisons, could be readily 
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applied in cases where precise measurement and controlled 
variation of the physical characteristics of the stimuli 
were not possible. 

I remember hearing Thurstone present his ideas on the 
development of psychophysics to a psychology seminar at 
Ohio State University in 1920. He pointed out that by 
developing psychophysical methods of this second type, 
one might secure a powerful set of tools for the quanti- 
tative measurement of numerous Subjective qualities » such 
as the relative esthetic merit of pictures, the relative 
desir&bility of a set of possible birthday gifts, the 
merit of English compositions » nationality preferences, 
or the attitudes of different social groups. L. L. 
Thurstone (1959) presents the development of his ideas 
in this field from 1927 to 1954. 

Among the Psychophysical methods of this second type, 
the one which required the minimum of skill from the 
subject and which provided the greatest set of possible 
checks on the internal consistency of the subject's be- 
havior was the method of paired comparisons.  Thurstone 
concentrated considerable attention on this method and 
was the first, as far as I am aware, to develop a theo- 
retical foundation which would provide a method of ana- 
lyzing paired comparisons judgments, that would check on 
the validity of the assumptions used in developing the 
theory, and give scale values. The result was the Law 
of Co REN Judgment (Thurstone » 1927a; Thurstone, 
192Tb). 

Briefly describing this law, we may say that it uses 
what may be termed a linearity assumption, a difference 
assumption, and a normality assumption. Using these 
three assumptions, Thurstone developed the Law of Com- 
parative Judgment for dealing with data gathered by the 
method of paired comparisons and demonstrated that the 
validity of these assumptions could be verified by the 
agreement among various values obtained for each differ- 
ence (S; - 81). Mosteller (1951) has furnished a statis- 
tical test fór goodness of fit, so that we can now say 
of any set of paired comparisons data that it is in 
agreement with the Law of Comparative Judgment, on the 
one hand, or, on the other, that the Law of Comparative 
Judgment and the data are not in complete agreement. 
Gulliksen and Tukey (1958) have given a method for 
assessing the magnitude of the disagreement. 
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Other theoretical approaches have been developed which 
may be used with data gathered by the method of paired 
comparisons. For example, Duncan Luce (19598), as well as 
Bradley and Terry (1952), used what we may term a line- 
&rity assumption, a ratio assumption, and a logistic 
assumption to develop & law of choice behavior. Although 
theoretically this law is completely distinct from the 
Law of Comparative Judgment as stated by Thurstone, 
interestingly enough, it turns out that any paired com- 
parisons data (involving experimental error) which will 
fit one theory will also fit the other, and if & set of 
paired comparisons data deviates from one theory, it will 
also deviate from the other. One would need something of 
the order of 10,000 to 50,000 judgments for each propor- 
tion before it would be possible to have one theory fit 
and the other not fit & given set of data. 

I have stressed the method of paired comparisons pri- 
marily because, as far as I can See, it has certain 


unique characteristics: 
l. From the subject's point of view, he judges with 
respect to some characteristic or attribute--brightness 


or saturation of a color, or the strength of an attitude 
statement--let us call this quantity "Xx." He is presented 
question, "Which 


with two objects and answers the simple 
is x-er?" Each judgment classifies the presentation into 
One of two possible categories, and thus secures a mini- 
mum of information--one bit--from the subject. 

2. Since each judgment is reduced to the simplest 
possible terms, & large number of judgments are required 
to give the requisite amount of information or the appro- 
priately large number of degrees of freedom to allow the 
experimenter to check on the reliability and the con- 
sistency of the subject and on the goodness of fit for 


theo: 2 

ee for all possible pairs of stimuli 
for sets in the range from 10 to 20 stimuli furnishes 
Sufficient degrees of freedom in the data so that the 
experimenter can check on several important aspects of 


his experiment: 
1. Are the data reliable? Do two different experi- 
iler results? 


ments give reasonably sim 
2. Are the subjects consistent? Do the percentages 

of judgments "a x-er than b, " "5 x.er than c," "e x-er 

than &," etc., fit into & pattern consistent with the 


linearity, normality, and difference assumptions? 
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2. Are the data in general in agreement or disagree- 
ment with the Law of Comparative Judgment or some other 
Appropriate law? 

4. What quantitative linear scale for the stimuli is 
implied by the data? 

The paired comparison method also has the interesting 
characteristic that it generalizes readily from the 
linear to the multidimensional situation. The method of 
tetrads may be regarded as paired comparisons for inter- 
Object similarities, or, we may also say, for inter- 
Object differences. In this method the subject judges 
not with respect to any specified &ttribute, such as 
brightness, pitch, etc., but only with respect to the 
characteristic of relative similarity or difference. 
Instead of using only two stimuli for the comparison, 
three or four stimuli are needed in order to specify & 
pair of inter-object distances. The subject is pre- 
sented with four stimuli organized as Al, A2, and Bl, 
B2. Again one bit of information is secured from this 
more complex presentation. He answers the question, 
"Which pair is more alike?" Since a minimum of informa- 
tion is obtained from each judgment, a large number of 
judgments are required to enable the experimenter to 
determine the reliability and consistency of his subjects, 
the goodness of fit of theory to data, the interpoint 
distances between the stimuli, and the dimensionality 
of the space determined by the stimuli (cf. Messick, 
19562; Torgerson, 1958). 

Both of these methods, the method of paired compari- 
sons and the method of tetrads, or paired comparisons 
for stimulus similarities, represent one extreme. A 
Simple judgment is required of the Subject and a single 
bit of information is obtained from the judgment. This 
means that a large number of judgments are required, 
making the method experimentally very laborious, par- 
ticularly if one wished to investigate a large stimulus 
domain. Therefore, considerable attention has been paid 
to development of procedures that demand more informa- 
tion per judgment from the Subject and hence require 
fewer judgments &nd enable the experimenter to investi- 
gate a larger set of stimuli. 

Let us indicate briefly the nature of methods at this 
other extreme. Again, as with paired comparisons, we 
wish to determine Psychological scale values for & set 
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of objects. In these methods & more complex judgmental 
burden is placed on the subject. This secures, we may 
say, more information from a single judgment and hence 
requires fewer judgments. These methods may be used to 
investigate a larger set of stimuli and usually require 
a less extensive theoretical structure. There may, of 
course, be linear or nonlinear relationships between 
Scales obtained by different scaling methods. 

sking the subject for one 


For example, instead of a 
"Is i x-er then Ai Answer yes or 


"How many inches long is this 
e for this sound," or, more 
io of i to j? Answer with 
proper fraction or 


bit of information, 
no," he can be asked, 
rod?" "Give a decibel valu 
generally, "What is the rat 
the appropriate whole number Or 


improper fraction." 
We can now indicate the Scope of the present volume. 


In general the chapters of this book will deal relatively 
little with either of the extreme types I have just 
mentioned--methods obtaining only one bit of information 
from each judgment, or those requiring a complete quan- 
tification from the subject. We will be dealing here 
for the most part with methods that make a compromise 


between these two extremes. 


2 Lyle V. Jones 
University of North Carolina 


Some Invariant Findings 
Under the Method 


of Successive Intervals 


all agree that invariance of 
ntial attribute of & scaling 

thod is established only after 
results do remain constant 


I suspect that we would 
scaling results is an esse 
method. The worth of a me 
it has been demonstrated that 
over a range of empirical situations with differing inci- 
dental conditions. As S. S. Stevens (1948, p. 21) has 
remarked, "...the scientist seeks measures that will stay 
put while his back is turned." 

In this paper I will try to specify some of the condi- 
tions over which estimates of scale parameters obtained 
by the method of successive categories have been demon- 


strated to remain relatively invariant. 
tegories applies to data 


The method of successive c& 
where each individual of a group assigns to each of a set 
Of stimuli a category rating. Stimuli might be consumer 
items to be rated according to preference. They might 
à according to esthetic 


be artistic productions to be rate 
1 statements rated according 


value. They could be yerba. 
to degree of belief. Or the ratings might pertain to 
arts, such as pitch 


stimuli with clear physical counterp 


or loudness of tones, brightness of visual forms, or 
ose of the method of 


heaviness of objects. The purp 
successive categories, often also called the method of 
EE 
The use of data and rese! 
Contract with the Quartermas 
the Armed Forces is acknowledged. 
which first appeared in 


Permission to reproduce this text; 
ogy, June 1959, Vol. LXXII, 


The American Journal of Psychol 
pp. 210-220, has been granted through the courtesy of Karl M. 


Dallenbach, editor. 1 


arch reports prepared under research 
ter Food and Container Institute of 
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successive intervals, is to transform the rating c&te- 
gories in accordance with the scaling model so that the 
numerical values may be taken to characterize responses 
to the stimuli on an equal-interval scale. It is impor- 
tant not to confuse the method of successive categories 
with what Stevens (19570) has called category rating 
scales. As will be demonstrated, results obtained by 

the method of successive intervals are markedly different 
from those which depend upon arbitrary assignment of 
integers to categories of a rating scale. 


Total Invariance When Anchoring Phrases and Number of 
Rating Categories Are Changed 

Total invariance is achieved when, over changed condi- 
tions, not only do scale parameters associated with a set 
of marker stimuli remain unchanged, but also the scale 
unit and the scale origin remain fixed. 

Consider the findings exhibited in Figures 2-1 and 
2-2. The names of 20 food items were presented on each 
of two Successive-category rating forms. One form con- 
tained nine rating categories, the other six. Descrip- 
tive adjectives Served as labels for the c&tegories. 

Each form was administered to a different sample of 
approximately 900 Army enlisted men, with instructions 
to cheek categories which best indicated their degree of 
like or dislike for each food item. 


0.00 was assigned the midpoint of the rating form, repre- 
senting a Judgment of "Neither Like Nor Dislike." 


The relation between the two Sets of independently 


determined scale values, Figure 2-1, not only is linear, 
but clearly is fitted well by & line of unit slope and 


FORM B 
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150 + 
1.00 F 
50 + 
.00 + Æ 

o 

69? 

1 l 
EI 50 1.00 1.50 
FORM A 


FIGURE 2-1. Estimated Successive-Intervals Preference Scale 
Values, 20 Food Items, Obtained When & Six-Category Rating Form 
Was Administered to 909 Subjects and & Nine-Category Rating 
Form Was Administered to 912 Different Subjects. 


ing categories labeled Dislike Ex- 
tremely, Dislike Very Much, Mildly Dislike, Mildly Like, Like 


Very Much, and Like Extremely. Form A presents nine rating 
categories labeled Dislike Extremely, Dislike Very Much, Dis- 
like Moderately, Dislike Slightly, Neither Like Nor Dislike, 
Like Slightly, Like Moderately, Like Very Much, and Like Ex- 


tremely. 


?Form B presents six rat 
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zero intercept, thus indicating total invariance of the 
two sets of scale Values. The same can be said of the M 
discriminal dispersions, Figure 2-2. While there is co 


1.50 


.50 


Bo 1.00 1.50 


FIGURE 2-2, E 
to the Scale Yi 


alues of Figure 2-1. 
Siderably great 


for dispersion estimates, 
Figure 2 i 
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It may be concluded that when the same stimulus items 

&re presented to different random samples from the same 
population, results of scaling by the method of successive 
categories are identical over changes in the number of 
categories on the rating form and changes in the descrip- 
tive-phrase labels of the categories. Neither of these 
forms of invariance is found for statistics computed from 


arbitrary integers assigned to the categories. 


An Invariance When Method Is Changed: Empirical Con- 
firmation of the Normality Assumption 

The method of successive categories, as typically 
applied, assumes univariate normality of each hypothetical 
preference distribution, taken over subjects. Rozeboom 
and Jones (1956) have demonstrated analytically that the 
error introduced by moderate departures from normality 
is slight. Moreover, Adams and Messick (1958) recently 
have suggested certain direct testable consequences of 


the normality assumption. 
‘An alternative empirical check on the normality assump- 
ive scaling model de- 


tion, depending upon an alternat 
veloped by Thurstone (1954), has peen investigated by 
Jones and Thurstone (1954). Let each individual be asked 


to give repeated ratings on & rating form. Assume that 
his successive attempts to rate the same stimulus reflect 
& normal error distribution over the hypothetical psycho- 
logical continuum. As with the method of successive 
categories, we do not assume that the distributions have 
the same variance for all stimuli. 

In particular, let us ask the subjects to rate & set 
Of stimulus items on two distinct occasions. By varying 
order of stimuli and the appe 
is possible to reduce objections from the subject to 
doing the task twice. 

The basic data to be analyzed are conditional distri- 

ond administration, 


butions of response to items On the sec 
given a particular response to corresponding items on the 


first administration. With & seven-category question- 
naire, we obtain seven such conditional distributions for 


every stimulus item. In general, where n is the number 
Of stimulus items and k is the number of categories, We 
would obtain nk distributions: Estimates of scale values 
of interval boundaries are obtained from that transforma- 
tion which is most effective in simultaneously normaliz- 
ing each of the nk distributions. 
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names of the same 20 food items, were administered to a 
group of approximately 250 Army enlisted men. The 20 
foods were identical to those scaled from ratings of 900 
Army enlisted men, reported earlier. When the successive- 
intervals scale Values of the 20 items &re plotted against 
the scale values obtained from repeated ar 

e e tw 


sets of scale values are iny: 
of the relationship; the dif. 


the two Scaling methods is the on]: 
divergence of results. 


From Figure 2-h it is apparent that the same relation- 
ship holds for 


estimates of Standard deviations of the 
preference distributions, but to a less precise linear 
Tib 
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SOL 


.50 1.00 1.50 2.00 
g, 


rvals Discriminal Dispersions (9, ) 
sions Derived from Repeated 
Seven-Category 


FIGURE 2-4. Successive Inte 
Plotted Against Discriminal Disper 
Administration (dg); 20 Food Items Rated on 


Form, N = 253. 


In another study, the same procedures were applied +o 
ratings for nine-category preference schedules. Not only 
were the forms of results the same 85 those represented 
in Figures 2-3 and 2, the slope of the resulting rela- 
tionship was indistinguishable for the two studies. This 
strongly suggests that the absolute invariance under 
changes in form of rating scale, already noted, applies 
to the method of scaling from repeated administration as 
well as to the method of successive categories. 

These comparisons of the method of successive categories 
with a method of repeated administration clearly support 
the generality of the former: It has been demonstrated 

ormal preference distribution 


that when the assumption of n 
is replaced by the innocuous assumption of normality of 
errors from repeated administration, resulting estimates 
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of scale parameters are invariant up to a linear multi- 
plier, or a ratio transformation. 


Invariance Inferred from Predictive Power 
The relation of predictive accuracy to invariance is 


analogous to that of validity to reliability of tests. 


Several sets of findings attest to the predictive power 
of the method of Successive categories. It has been demon- 


Sons (e.g., Edwards & Thurstone, 1952; Saffir, 1937), when 
the same stimuli are involved in both methods. Further, 
from Successive-intervals results, the prediction of 
choice of one stimulus from three or more stimuli is &ccu- 
r&tely achieved (Bock, 19568). The Scaling model can even 
be extended to allow satisfactory predictions, from prefer- 
ence parameters estimated by the method of successive cate- 
Bories, of the relative frequency of Purchase of consumer 
items (Jones, 1956a). Such predictive Success demon- 
strates that the scale results are not transient and are 


urement but rather are invariant constructs applicable 
over & wide range of behavior. For only if the model 


Invariance of Scale Under Changes of Both Stimuli and 
Subjects 
ee 


For some practical situations, it is convenient to have 
a "yardstick" for psychological measurement which remains 
stable over time and which can be repeatedly applied for 
a given measuring purpose. Such a situation is found, 
for example, in studies of consumer acceptance. It may 
be desirable to measure acceptance of different consumer 


A nine-category hedonic scale has been used for several 
years by the Quartermaster Food and Container Institute 
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for the Armed Forces to assess food acceptance among 
troops. While results have been summarized primarily in 
terms of the assignment of the arbitrary integers one 
through nine to responses in the categories, it has been 
recognized that the resulting "scores" cannot always 
legitimately be considered to fall on an equal-interval 


Scale. 
À deterrent to the routine use of the scaling method 


Of successive categories for food acceptance survey data 
has been the relative complexity of the computational 
Procedures. While the labor involved need not be great, 
it is always more complex than tl 
trarily assigning numbers to the rating categories. This 
deterrent would be removed if it were discovered that & 
Single scale transformation could legitimately be used to 
determine the equal-interval scale values representative 
Of the category boundaries on the hedonic scale for all 
national food acceptance surveys: 

To investigate the degree of invariance of one egual- 
interval food acceptance scale from another, data were 
Obtained from eight national food acceptance surveys 


administered by the Quartermaster Food and Container 
Instituts Detwuen February 1950 abd June 199]. The 
5h food items each and were 


Surveys involved from 36 to R 
administered to samples ranging in approximate size om 
2,500 to 6,000. The sample sizes for the eight surveys 
appear in the legend of Figure 2-5. 

b The results from each 5 arately treated 
y the scaling method of suc 
SC widths of the hedonic categories 
qual dths appe 

unit scale. Inter m for the first two 


With the f results 

possible exceptions O 
Surveys, the stability of interval width is remarkable, 
It is to be remembered that the surveys involve distinc 


Samples of respondents and 8180 differ with respect to 


the foods to be evaluated. 

In view Se ae i degree of stability eg > 
gory widths for the various surveys, a stend : NT 3 
Solution for category boundaries seems warranted (7 " By 
1956). Utilizing the standard solution for any o: R e 
samples studied would provide estimates for scale ny 
and dispersions of the food ly identical 


those obtainea directly from 
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FIGURE 2-5, Widths of Categories 2-8 of the Quartermaster 
Hedonic Scale » Determined 


from Each of Eight Preference Surveys. 
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The Case Against Category Seales 

Recent papers by Stevens 1957b) and Stevens and 
Gelanter (1957) may be interpreted as presenting & strong 
case against the use of so-called category scales. Direct 
interpretation of category ratings, with or without equal- 
interval instructions to subjects, is severely limited by 
lack of invariance. Ratings are clearly sensitive to 

the form of distribution of stimuli, particularly to 


stimulus spacing. Ratings also are greatly affected by 
end effects of the rating form and by non-uniform ease 
of stimuli. In 


of discrimination over the entire range 

contrast, there is evidence to indicate that scale 
values obtained by the method of successive c&tegories 
are relatively free from effects of stimulus spacing and 
from end effects. And differences in discrimination 
have a place in the model, affecting size of discriminal 
dispersions but not scale values. It also should be 
noted that while successive-intervals results may be used 
successfully to predict consumer choice behavior, direct 
prediction from category scales has not proven success- 


ful. 

For a wide range of stimuli, an opserved relation 
between category means and successive-intervals scale 
values is shown in Figure 2-6. The stimuli are 51 
descriptive adjectives rated along 
from a denotation of greatest sage? 
Greatest like (Jones & Thurstone,; 
of the continuum, like or dislike, the relationship ie 


monotonie but concave t the abscissa. l 
tions are found for preference, put since stimuli gener- 


ally assessed are more homogeneous, tne ng Bee 
may show less marked departure from 1inesstty TT" 


& Thurstone, 1954 09) 

pe 109)- 
A more striking discrepancy of category en he 
Successive-intervals result 
ures of dis Typically 

persion. 

or at least a near-zero linear correlation 18 POS 
between standard deviations Of stimulus distri 
Over the category scale 9 äiseriminal a SET annos 
Found rvals. This fin 
to be ge = al factors. A component 
SE Weeer eg +o end effects ee 
ment of dispersion on cale ae ah 
Stimuli which on an equal S 
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M 


FIGURE 2-6. Category Means (M) Plotteà Against Successive- 
Intervals Scale Values (S), 51 Adjectives Rated on Nine- 
C&tegory Form, N — 905, 


ns on the category scale and 
discriminal dispersions on the equal-interval scale may 
be attributed to the unequal category widths. 
Summary 

The findings regarding invariance of the results of 
Scaling are Summarized in Table 2-1. 

By the method of Successive in 
usually are estimated to represen: 
equal-interval, Psychological scale 


mean of the subjective distribution and the discriminal 
dispersion or standard deviation. 
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Bories of a rating form, these estimates are related by 
an identity transformation. Results are invariant up to 
& linear multiplier when the assumption of normality of 
subjective distribution is replaced by an assumption of 
normal error distribution of differences in subjective 
values obtained from repeated ratings (Situation 2). 
Invariance of results up to a linear transformation was 
inferred from the predictive Power of those results 
(Situation 5). Invariance also was displayed (Situation 
) when the Same scale-form was used for ratings of dif- 
ferent sets of Stimuli and was administered to distinct 


The invariance of results from the method of succes- 
sive intervals over a wide range of conditions supports 
the adequacy of that scaling model and its potential 
usefulness for predictive purposes. 


Massachusetts Institute 
of Technology 


3 Warren S. Torgerson 


Quantitative Judgment Scales 


Stevens and Galanter (1957) and others have demon- 
judgment methods fall into 


Föraren that the quantitative 
WO general classes--the magnitude methods and the cate- 
gory methods. 

In the magnitude methods, such as the methods of 
fractionation, numerical estimation, and the like, the 
Subject is supposed to adjust stimuli or assign numbers 
50 that the ratios between the subjective magnitudes in- 


volved correspond to the numerical ratios between the 
For example, in the 


Corresponding assigned numbers. 
fractionation methods, the subject adjusts 4 variable 
Stimulus until it stands in & prescribed ratio to a 
Standard stimulus. Or, he may be presented with & stand- 
tructions to report 


ard and a variable stimulus with ins k 
the ratio between them. In the most unrestricted magni- 


tude method, the subject 15 simply presented with & series 


of Stimuli, one at & time, and told to assign numbers to 
ers assigned are proportional 


the stimuli so that the numb 

to the subjective magnitudes of the stimuli. i 
In the category methods, such as the method of equi- 
Section, equal-appearing intervals, and the rating scale 

methods, the subject is supposed to adjust stimuli or 
assign numbers so that the intervals between the subjec- 
tive magnitudes correspond to tne numerical differences 
between the corresponding For example, in the 
ented with two 


method of equisection, the § 


numbers « 
ubject is pres 


Be a ees oe 

The research reported in this paper wes supported jointly by 
ES. Army, Navy, and Air Force under contract with the Mass&chu- 
Setts Institute of Technology. 
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Standard stimuli. His task is to adjust one or more 
variable stimuli to divide the interval between the 
standards into equal steps. In the method of equal- 
appearing intervals, he sorts the stimuli into piles so 
that the piles represent equal steps along the continuum. 
In the rating methods, he assigns stimuli to categories 
that are Supposed to be equally spaced along the con- 
tinuum. 

For some attributes--Stevens (19576) calls them meta- 
thetic attributes--the two classes of methods give scales 
that are linearly related. Here, all is well. But for 
another large &roup--the prothetic attributes--the scales 
are clearly not linearly related. For the prothetic 
attributes, when a scale is constructed so that numerical 
ratios correspond to observed subjective ratios, then the 
numerical differences do not correspond to observed sub- 
Jective differences, and vice versa. 

The difference between the two is most neatly demon- 
strated with the numerical estimation procedure--a magni- 
tude method--and the category rating method. At first 
glance, the experimental procedures seem to differ only 
trivially. Suppose we are given a set of stimuli. In the 


subjective magnitudes. We present the stimuli one at a 
time. The subject uses any numbers that seem appropriate 


ject to rate the Stimuli on an ll-point equal-interval 
Scale, using the numbers from zero through 10. We might 
anchor the scale by telling him that the bottom boundary 
of the zero category is to represent a zero amount of 
the attribute, and the largest stimulus is to define the 
top of the tenth category. We might also show him the 
largest and smallest stimuli in the set to indicate to 
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3-1 illustrates the sort of relations that are obtained 
among the two quantitative judgment methods and the cor- 
responding physical scale. The general form of the curves 


CATEGORY SCALE 
CATEGORY 


DE 
MAGNITUDE SCALE LOG MAGNITU 


CATEGORY 
LOG MAGNITUDE 


LOG PHYSICAL LOG PHYSICAL 


1 Relations Among Category; Magnitude and 


FIGURE 5-1. Typica. 
Physical Scales. 
nts from 


ent median judgme 
repre would 


is typical when the points hape, of course, 


groups of subjects. The precise © 
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depend upon the particular distribution of stimuli used 
and the particular anchoring conditions of the experiment. 
The category rating method is especially sensitive to the 
former. The magnitude method is somewhat sensitive to 
the latter. 

Since the category scale is very nearly the logarithm 
of the magnitude Scale, it turns out that stimuli whose 
differences are reported to be subjectively equal are 


by equal intervals on the category scale. Indeed, accord- 
ing to the subjective reports, stimuli separated by sub- 
jectively equal ratios are also Sep&rated by intervals 


Somehow, this is not the Way things ought to behave. 
If there is a single Psychophysical law, then at least 
one of the scales must be biased. 

Dr. Stevens (19575), of course, holds that it is the 
magnitude methods that give the proper psychophysical 
relationship. He has explained the results gotten by the 
category methods ag being caused by the subject confusing 


I have been rather hoping that a champion would come 
along for the category scales, if only to even things 
cut. It seems to me that there is much th&t can be said 
in their favor. N 

Actually, I do hot think either of the two methods is 
in error. Instead, I Suspect they reflect more or less 
directly the two standard Ways we have of regarding--and 
using--number or quantity. 

Suppose a rat and a horse each gain a pound during a 
Specified period of time. In one Sense, the two gains 
are equal. In another Sense, however, the horse increased 


his weight only slightly, whereas the rat's increase was 
tremendous. 
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ange from 10 to IL 
from 1000 to 1001 

We get a second type 
he change from 10 
hange from 1000 to 


If we assign numbers so that & ch 
Pounds is the same size as a change 
iS iei we get one type of scale. 
= ee by assigning numbers so that t 
1100. more nearly equal to--say--a € 
ee scale turns out to be roughly the 

er--and this sounds familiar. 
Ne ite we settle on the firs 
VW says the rat and the horse gaine 

© would find that ratios were the mo: 
deal be relations among the objects 
Peres talking about relative gains, 
ases, or orders of magnitude. We would round the 
numbers to a fixed number of non-zero digits, and specify 
precision to 1 part in x. We would also find that the 
ae or quantities we actually use tend to progress in 
a P nately an exponential series. round number 
endeney" is such that the numbers would get rounder as 
we go up the scale. 
fi If we settled on the second type of scale, we would 
t nd that differences were the more useful. We would 
alk about gains of S of so many units, would round to a fixed 
number of decimal places and would specify precision to 
Plus or minus so many units. Numbers at the top of this 
Scale are no rounder than those at the bottom of the 
Scale, 
gu, les uf both Seege, types are widely geret 
hysical measurement. 
o Ce the first type of physi 
at R just the way of me 
ro. The scale thus go€ 
amples are length, weight, 


log of the 


t type of scale--the kind 
à the same amount. 
st useful of the 

. We would find 
percentage in- 


scale, the attribute-- 
aaa it--is pounded only 
ro to infinity. 
esist&nce. 
numbers 


Here 
we do tend to use an exponent 
or f non-zero 
quan: ed number O: 
Voss Ric c un hale the ratio relations. 
d with 


1 

een and, in general, to © F 
e second type of physical sca. 

& 

Sitributes that are bounded On both ends, Í 

portion, or time of day- with these sca. a 

E use an arithmetic series Of numbers, to Go to 

e xed number of decimal places; e in Bene d 

ao differences. While 2 gain 
9 or small d ding upon the wei 

t epending UP 

hat did the gaining, the size of af 


in 
time of day stays put. 


ive 
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It seems to me that the magnitude methods, with their 
emphasis on the ratio relations among stimuli, give us 


Suppose we want a Scale to measure an &ttribute from 
the opposite direction, as, for example, when we measure 
conductivity instead of resistance. In physical measure- 
ment, if the original scale is of the type that is bounded 


if the proportion of black marbles in an urn is .75, then 
the proportion of non-black marbles is 1.00 - [5 or 254 
If a mark on a yardstick is 10 inches from one end, it 

is 26 inches from the other end. 

When the Original scale extends from zero to infinity, 
We cannot proceed in this Way. Instead, we would reverse 
the attribute by taking the reciprocal, as in the case of 
resistance and conductivity. 

It seemed worthwhile to find out if the two forms of 
quantitative Judgment methods would behave the same way 
when the attribute is reversed. I decided to look at the 
black-to-white dimension of Munsell neutral gray color 
chips. 

This seemed like a particularly good attribute to look 
at for several reasons. One direction of judgment does 
not seem much more difficult to the Subject than the 


Second: How are the two magnitude scales &nd the two 
category scales related to the physical attribute of 
reflectance? 


Third: How do the four scales relate to Munsell value? 
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Scale for lightness the reciprocal of the magnitude scale 


for darkness? 
Four experiments were run--magnitude and category judg- 
tegory judgments 


ments of lightness and magnitude and ca 
Of darkness. The experiments were run on different days 
for & given subject. 

The stimuli were 17 Munsel 
ing in equal steps of value from 
Stimuli were used in all four experiments. 

We used 16 subjects, divided into four groups of four. 
The sequence of experiments was different for each group. 


For the magnitude experiments, the standard was & 
Subjects were told it 


Gray paper of 5.5 Munsell value: 

represented a numerical value of 10. The standard was 
always present, and located just above the variable 
Stimulus. The variable stimuli were presented one at a 
time. Instructions to the subject emphasized that he was 
to assign numbers so that the numbers were proportional 


to the 1i ss--of the stimuli. 
en the subject was instructed 


For the c& eriments 
to rate the groer aen equal-interval scale from zero 
through 10, where the lower pound of the zero category 
Was to represent zero and the top category, the lightest-- 
Or darkest--stimulus in the series, He $ shown the two 
extreme stimuli at the beginnin& of the experiment to vo 
him anchor his scale. Stimuli were then presented one & 


1 neutral gray papers, rang- 
1.5 to 9.6. The same 


& time. GE 
Each subject judged each stimulus five times in eac 
experiment. u 
The results of interest here are shown in Figures 27 

through 3-6. 
ation petween category ratings 


ee i Tr one 
es begi th Å 
Se pitfalls I dià not nou N EN 
The background against wh imuli were P 
made of gray felt. Its re 
between the fifth and sixth POPES plot for the magni- 
Figure 5-5 shows the corresponding PO". Pound is 
es Again, the ae et the lightness 
Bpparent. If a straight line is fitted ion aos 
data, its slope will be very nearly Lier? 
value obtained by Stevens and Galanter (19977 + 
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Magnitude Estimations vs. Reflectance. 
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Figure 3-4 shows the relation between magnitude and 
category scales. For both lightness and darkness, the 
category scale is almost the log of the magnitude scale. 
Hence, both lightness and darkness appear to be prothetic 


&ttributes. 


+ DARK 5 
10,9 o LIGHT o 


MAGNITUDE ESTIMATIONS 


CATEGORY RATINGS 


FIGURE 3-4. Magnitude Estimations VS» Category Ratings. 


5-6 indicate that the two 


methods behave as expected. For the category methods, 
lightness is the reverse of darkness, whereas, for the 
magnitude methods, lightness ig the reciprocal of dark- 


ness. ) T 
Thus, when measured OD category scales, stimuli which 

are equally spaced with respect to lightness are also 

duc In the category 


e i ec 
qually spaced with ne i onehiys are invariant. St Small 


Finally, Figures 3-5 and 


Scales, the distance 

————~ mess, on the other hand 
separ: atios of ligh , D 
parated by equal r pm of darkness. 


ar u‘ 

E cn WEA tude methods. Here, 
ratio relationships remain invariant, s a Se 
A set SP stimull equally spaced according to 118 ness 

according to darkness values. 


values are not equally spaced 


or the magni 
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Category Ratings: Lightness vs. 


10 


Darkness. 
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Magnitude Estimation: Lightness vs. 


Darkness, 
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that the results throw some doubt on 


It seems to me 
ty of the two classes of 


the notion that the non-lineari 
methods is caused by the subjects' confusion of discrimi- 


nability with distance. It sort of goes against the grain 


to say that the discriminability of a particular pair of 


stimuli is very good with respect to lightness, and very 
poor with respect to darkness, even though the discrimi- 
nability stays the same with respect to the number of 


times the two stimuli are confused. 
I think the results also indicate, at least as far as 
the measurement of color is concerned, that the category 
method is to be preferred. The location of colors in a 
multidimensional space would seem to require a method of 
measurement for which distance relations are invariant 
over changes in direction. I do not see how one could 
hope for an adequate multidimensional representation of 
color if the distance between colors varies depending 


upon the direction of comparison. 


4 Roger Shepard 


Bell Telephone Laboratories 


Similarity of Stimuli 
and Metric Properties 


of Behavioral Data 


A wide variety of psychological experiments can be 
devised to include one arbitrary collection of N stimuli. 


For many of these experiments, moreover, the resulting 
ed in an NxN matrix 


data are most satisfactorily display 
in such a way that the entry, Sik? at the intersection 
Of the i-th row and k-th column pertains to the ordered 
pair of stimuli (Si, Sy). Some concrete examples may 


make this clear. 


Amplitude of a Genera. 
For each of & numb 
Sponse can be conditi 


lized Conditioned Response 
er of subjects, & galvanic skin re- 
oned to one of the N stimuli by re- 


peated presentation of that stimulus together with elec- 
tric shock. The N stimuli can then be presented in & 
random sequence without the shock and the galvanometer 
deflection recorded for each presentation (Hovland, 1957). 
The entry s,y in this case 15 the mean deflection, when 
Sy is presented, averaged over all those subjects for 
whom the galvanic response Was originally conditioned to 


Si. 
Probability of 8n Overt Error During Paired-Associate 
Lear 

ning number of subjects & different one-to- 


be established between the N stimuli 
ntiated responses. Associative learn- 
ng to the rule that on each 


one assignment 


and N highly differe 
ing can then proceed &ccordi 


IM ee 


This paper wa 
University. 33 


s read while the author was affiliated with Harvard 
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to that stimulus (Shepard, 1958b). The entry Sur in > 
case is the mean conditional Probability, when 81 is p 
Sented, of the response assigned to Sy. 


sented by itself and then two side-by-side, and so that 


one of the two Stimuli presented during the second part 
is always identical to the single sti 
during the first part. 


as quickly as possible > & left-hand or a right-hand key 


Sorting Time 

Each subject can 
quence of stimuli, each of which 
two fixed alternatives, 
ing instruction + 


sible into two homogeneous and exhaust 


operative and, second, to 
attribute this factor to the 


One circumstance ostensibly 
shared by all of these various experiments, namely the 


Metric Properties of Behavioral Data py 


Kë NM. Moreover, this underlying f&ctor, like 
e ye of data from which it is inferred, evidently per- 
dre o pairs of stimuli and, since in one experiment the 
Fs represent direct judgments of similarity, this factor 
conveniently referred to as the similarity factor. 
ME. might be possible, then, to single out one of these 
= ermative N-order matrices, proclaim it to be the funda- 
nen matrix of similarities, and proceed to show how 
he Other matrices can be derived from the fundamental 
matrix by certain specified transformations. However, 
Since the corresponding entries in these various matrices 
E evidently not linearly related, & degree of arbi- 
r&riness attends any proclamation of this kind. Further- 
more, to express each of the different matrices as some 
transformation of one prototypic matrix of similarities 
is not necessarily to reduce the data to their most epi- 


&ramm&tic form. For even the entries of such a proto- 
t to certain internal 


typic matrix are apparently subjec 
Constraints, since two stimuli that are both very similar 
similar to each 


to a third must be at least moderately 
hat the entries of a matrix 


Other. But to the extent t 
are constrained instead of free to vary independently, 
the matrix can in principle be collapsed, without loss 
Of information, into & smaller set of numbers. To effect 
Such a collapse in the present case is tantamount to un- 
Covering the basic structure of the psychological simi- 
larities underlying the diverse pehaviors of subjects 
With respect to & common set of stimuli. 
n Stimuli 


The Physical Distance Betwee 
to the various experiments 


Now the N stimuli common 
that have been considered can be selected from a circum- 
1 set of independent 


scribed domain defined by 2 smal 
Physical variables. An auditory signal generator, for 


instance, can be used to present sinusoidal tones of any 


prescribed amplitude or frequency within certain limits. 
ical variables, then, enable one to 


The independent phys " 
define a physical distance between stimuli, where by a 
a measure Dix associated with 


distance is meant, here, — 
every pair of stimuli Si and Sy and satisfying the fole 


lowing metric axioms: 
I. The distance between & 


is zero, 


ny stimulus Gi and itself 
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II. The length of one side of a triangle formed by 
any three points 55 S,, and S, cannot exceed the sum of 
the lengths of the other two sides, 


er set of numbers, namely 
thougn, the 
Separated by a 


Metrie Properties of Behavioral Data 31 


fixed physical distance is known to vary widely depending 
upon the position and orientation of the two stimuli in 
physical space. Thus a set of tones that is equally 
similar to a given tone, psychologically, will not in 
general describe a circle in physical space. Moreover, 
the actual configuration of such & set will be liable to 
extensive changes as the given tone is moved from one 
location to another. Clearly, then, no one transforma- 
tion will suffice to convert any matrix of physical dis- 
tances into the corresponding matrix of behavioral data. 


The Psychological Distance Between Stimuli 

Thus far there are, on the one hand, & number of 
matrices of behavioral data governed by the psychological 
Similarity of stimuli and, on the other, & matrix of 
interstimulus distances based upon the objective physical 
Coordinates associated with the stimuli. No one of the 


behavioral matrices seems more fundamental than the others 


and, although they are all highly redundant, there seems 


to be no way of reducing the number of entries so as to 


be consistent with the number of degrees of freedom. And 
again, although the matrix of physical distances is unique 
ected to provide an accu- 


and reducible, it cannot be exp 
rate account of the behavioral matrices. Thus efforts 
designed to discover the psychological structure under- 


lying the similarity of stimuli seem to encounter a 


dilemma. 
esolved if a matrix of psycho- 


This dilemma might be r o 
e found that is like the matrix 


that it satisfies the metric 


&xioms and reduces to & smaller set of numbers but is 
ovides an accurate dccount 


unlike that matrix in that it pr 
of the behavioral data as well. The search for such & 
s is invited by the 


matrix of psychological distance i e 

striking resemblance between the metric axioms and the 

constraints inherent in the empirically determined simi- 
t two stimuli that are psychologi- 


larities. The fact tha ych 
cally similar to a third must also be somewhat similar to 
each other seems to parallel the metric restriction, 
stated in the second axiom, that two points that are both 
near to a third must also be moderately near to each other. 
The initially disparate notions of similarity and spatial 
Proximity evidently reduce to the same thing. 


logical distances could bi 
Of physical distances in 
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Now if physical and psychological distances were 
metrically equivalent, sets of stimuli that are equally 
similar to & given stimulus would have a circular locus 
in physical space. But they do not. However, the reason 
that they do not may be Simply that, around any given 
Stimulus, subjects are more sensitive to variations along 
certain directions and less sensitive to variations along 
others. Thus the actual isosimilarity contours may depart 
from circularity only by a kind of local stretching along 
Some axes and compression along others. But the class of 
transformations that behave locally like a combination of 
axial expansions and contractions is the class of differ- 
entiable deformations. These transformations have the 
property that, in general, circles go over into differ- 
entiable closed curves and, in particular, small circles 
go over into ellipses (Klein, 1939, p. 73, pp. 104-105). 
Therefore this conclusion, that psychological distances 
are equivalent to physical distances except for a differ- 
entiable transformation, finds empirical support in the 
work of Silberstein (1943), MacAdam (1942), Brown (1951); 
and others carried out in the domain of equiluminous 
colors; for they have shown that with sufficiently large 
values of the parameter, isosimilarity contours become 
elliptical in form. 

If the isosimilarity contours in physical space differ 
from circles only by a differentiable transformation, then 
application of the inverse transformation should carry the 
physical space into & psychological space in such a way 
that the isosimilarity contours are circularized.  Conse- 
quently, the matrix of distances between N points in this 
transformed space should provide a precise account of the 
various behavioral matrices discussed earlier. This 
matrix of psychological distances can therefore be taken 
as the fundamental matrix since its elements satisfy the 
metric axioms and, as will be seen, provide for a further 
condensation of the behavioral data. 


Multidimensional Scaling of Stimuli on the Basis of 
Behavioral Data 

From the present standpoint the problem of the relation 
between physical distance and behavior in the various ex- 
periments that have been considered reduces to two rel&- 
tively independent subproblems. One, a psychophysical 
problem, is to determine the n&ture of the unique function 
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that transforms physical distances into psychological dis- 
tances for any given domain of stimuli. Such & psycho- 
physical transformation may be quite complicated. With 
respect to the domain of equiluminous colors, for example, 
present evidence indicates that no projective transforma- 
tion will suffice (MacAdam, 1942). Indeed, different 
types of transformation may be required for different 
domains of stimuli or even, as Stevens' distinction be- 
tween prothetic and metathetic continua suggests (Stevens, 
195Tb; Stevens & Galanter, 1957), for different physical 
dimensions within the same domain. 

Fortunately the other problem, referred to as the psy- 
chological problem, can be investigated in the absence of 
& general solution to the psychophysical problem. The 
Psychological problem is concerned only with the deter- 
mination of the unique function that transforms psycho- 
logical distances into behavioral data of & specified 
type. By hypothesis, for any given type of data, this 
function will be the same for every domain of stimuli and 
for every dimension within & domain. Such a hypothesis, 
if borne out, supports the notion that one central mech- 


anism controls a subject's discriminative responses to 
any set of stimuli. The particularity of the several 
Psychophysical transformations, in contrast, presumably 
reflects the diversity of the peripheral transducers 
Specific to the various physical continua. 

The present problem, then, is to discover whether there 
exists a unique function, f, that transforms psychological 
distances into behavioral data of a certain kind. Actually, 
however, it is necessary to proceed in the reverse direc- 
tion since it is the behavioral data and not the psycho- 
logical distances that are initially available. In prin- 
ciple any postulated function, f, could be tested by 
applying the inverse function, f-l, to the behavioral 
data and, then, by determining whether the resulting 
presumptive distances satisfy the metric axioms. In 
practice the satisfaction of the metric axioms is most 
conveniently assessed by, first, forcing the presumptive 
distances to conform with those axioms and, then, deter- 


mining whether the original data can be reconstructed on 
To the extent that 


the basis of the "forced" distances. 
the reconstructed data match the original data it can be 
concluded that the "forcing" is unnecessary &nd hence 
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that the presumptive distances already satisfy the metric 


axioms. : 

Suppose, as & concrete illustration, that the empirical 
data are the conditional probabilities from an experiment 
on paired 


-associate learning (as outlined in the second 
example at the beginning of this paper). Suppose, further, 
that the N stimuli are confined to a relatively small 


region of & K-dimensional physical space. The NxN 
Stochastic matrix, 


pictorially represe 
Figure 4-1, 
function to 
& matrix Dg 


sions. Thes 
fore be redu 
space. But 
is to force 


be shown to 

The triangle inequality, for instance; 

then follows easily from the inequality of Schwarz 

(Birkhoff & MacLane, 1955, p. 191). 

to select, empirically, among Various choices for the 

function f by determinin 

bilities reconstructed s 

ordinates approximate th 
The distances are red 

ordinates and, hence, 


5 ng algorithm developed 
and successively refined by Richar 
Householder (1938), Torgerson (195 


Abelson (1956). First, from the 
tive distances, is deri 


2), and Messick and 
matrix Das Of presump- 


Then, corresponding to the x 1 
calculated K Nx] column matric 
containing the coordina: 
thogonal axes in Psycho 


es, 
tes for the 


logical Space, 
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FIGURE 4-1. 


sentation of these operations can be set down a little 
more concisely by recasting the equations in matrix nota- 
tion. (In the figure, letters standing for matrices are 
distinguished by pold-faced type.) For simplicity, the 


following conventions have been adopted: 
I = the NxN identity matrix with unit elements in the 


diagonal cells and zero elements in all others. 
lements in all cells. 


U = the NxN matrix with unit e 
A' = the transpose of the matrix A. 
diagA = the matrix with diagonal elements taken from A 
ts in all non-diagonal cells. 


but with zero elemen 
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fA = the matrix obtained by applying the function f 


to each element of the matrix A. 
the matrix obtained by raising each element of 
the matrix A to the power p. 

The equation given in the figure for calculating the 
presumptive distances, although theoretically correct 
for infallible data, is somewhat modified in practice in 
order to insure the necessary degree of statistical sta- 
bility for the Spatial coordinates. For example, the 
matrix Dag is, in effect, &veraged with its transpose 
before computing the matrix of scalar produets (Shepard 
1957). This can be regarded as the first step in forc- 
ing the distances to S&tisfy the metric axioms, for it 
follows from those axioms that the matrix of distances 
Should be symmetric. In addition, owing to the extreme 
unreliability of the estimates for the large distances, 
Special procedures are ordinarily needed to readjust 
those distances so as to be consistent with the small, 
reliable distances (Shepard, 1957). 

As indicated in the figure, matrix equations can also 
be formulated for the reverse process of reconstructing 
the conditional probabilities from the coordinates. If 
the stimuli are confined to a sufficiently small region 
of physical space, then the number of column matrices of 
coordinates, K, will be equal to the number of dimensions 
of the physical space. Thus for the entire scheme illus- 
trated in the figure only the function, f, is unknown. 
The empirical problem, then, is simply to find the func- 
tion that will maximize the variance common to the orig- 
inal and the reconstructed matrix, Pag, of conditional 
probabilities. The working hypothesis, again, is that 
this function will not depend upon the set of stimuli 
selected. Indeed, evidence already collected in this way 
for sets of stimuli varying along a number of different 
Physical dimensions shows that the function f in the case 
of paired-associate learning is probably very close to a 
Simple exponential decay function (Shepard, 1958a; Shepard, 
1958b). Evidently this relation is even preserved when 
the stimuli vary along a short span of a single Physical 
dimension. This, of course, constitutes the Severest 
condition since the Presumptive distances must then be 
additive (in accordance with the limiting equality of 
the second axiom), and since the N2 conditional Probabili- 
ties must be recoverable from just N spatial Coordinates. 


AUD] _ 
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` Once the correct form of the function f has been deter- 
mined for any particular type of experimental data, the 
spatial coordinates calculated from those data furnish 
& scaling solution for the N stimuli. For any given set 
of N stimuli, the scaling solutions may be essentially 
the same for all of the different experimental procedures 
initially outlined. This conjecture should be relatively 
easy to test once the form of the function, f, has been 
determined for each procedure separately. So far this 
has been accomplished only for the paired-associate pro- 
cedure. (However, Kellogg V. Wilson of Duke University 
has recently been exploring the relations between some 
Of the other procedures. 

Insofar as the spatial coordinates derived from the 
behavioral data are regarded as & multidimensional repre- 
sentation of the stimuli, however, the methods described 
here may be limited in application to stimuli that are 


confined to a relatively small region of physical space. 
umably differs from physical 


Since psychological space pres 
Space by a differentiable deformation, the psychological 
distances between widely separated stimuli may be non- 


Euclidean. Actually there are two possibilities: (a) 
though non-Euclidean, may still 


The psychological space, 
be embeddable in a Euclidean space of a greater number of 
two-dimensional non-Euclidean 


e isometrically embedded in 
the present procedure 


Separated stimuli, 
Psychological coordinates wi 

ical dimensions of the stimuli. (b) The psychological 
Space may be of such & variety that it cannot be embedded 
in & Euclidean space without doing violence to the psycho- 
logical metric no matter how many dimensions are added. 

In this case more general scaling methods will have to be 
devised for sets of stimuli that are widely separated in 
Physical space. However, this problem is not specific 

to the present procedure but applies, equally, to all of 
the multidimensional scaling methods that have been pro- 
posed to date. 


5 Bert F. Green, Jr. 
Massachusetts Institute 
of Technology 


A Technical Note on the Method 
of Successive Catagories 


Using Category Means 


The familier form of the successive categories model 
and standard deviations, 


relates the stimulus means, mi, 
01, to the category boundaries, tg, by the equation 


g=1,k-1, 


where Xgi is the normal deviate corresponding to Dgi» 
the proportion of times stimulus i is placed in cate- 
gories 1 through g, i-e», & cumulative proportion. 

ne or two other places the 


In Guilford (1957) and o 

procedure is modified by rewriting the model to use cate- 
gory means rather than category boundaries. With this 
procedure there is & Score; namely the category mean, 


for each interval, rather than placements of category 


boundaries. 
It is a mathematical fact that the mean of stimulus 
distrivubíen V in cotRgory Br SEI Ug. 18 ST by the 


equation 
Uo RH yea) - Y(Pg Ee, 
(2) wi E s eee i1=1,N; g=1,k, 
A Pei "eil 


iere y») is the normal ordinate corresponding to the 
cumulative proportion P. Note that the denominator on 
ortion of times stimulus i was 


the right is the prop 
placed in category E: The category mean for each 
45 
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Stimulus can be computed, and then averaged over stimuli 
to give an average category mean. There m&y be some 
troubles with this procedure, but I am not considering 
that aspect. Rather, I am concerned with the situation 


where equation (2) is turned into a model by removing 
the i subscript from Ugi: 


i eres FU cx 4) 
= Pel. ~ Teil 


Warren Torgerson and I were uneasy about this model and 
have recently started looking at it closely. (In what 
follows, let c, = 1 for all i.) First the two models 
are different. Data that fit (1) exactly do not fit (2). 
It would be nice to say that data fitting (5) would not 


fit D but it turns out that there are no data that 
tic (5). 


The trouble is that the parameters ug and mi are not 
independent in (5) while they are in (1). Thus for any 
particular stimulus, given the parameters mj and tg, & 
consistent set of proportions can be found and, con- 
versely, given a set of cumulative proportions, a set of 
tg can be found for any mi. Not so in (3). Given a set 
of proportions there is only one set of Ug and one mj 


that satisfy. There is one too many equations, so the 
parameters are interdependent. 


There are two minor facts that give a hint of the 
trouble: 


My Sm, Su 


uy cu 216. 


and 


Consider the case of three categories, 
tion poy = O and Di = 1, so that Yoi = 
the origin of the scale is arbitrary, s 
Then the three equations are 


Now by defini- 
Yki = O. Also 
o let ui = 0, 


y) 
bed DUM A 
IM Pu 


47 


Technical Note on Successive Categories 


y(Pp,) - YP) 


(4.2) E ssl eme. 
La Boi 7 Pig Í 


y (Pos ) 
(4.3) Ug -n = S 
1 - Pos 


Now there are only two proportions, but three parameters. 
Hence there is one (nonlineer) restriction among the 
parameters. If we consider the parameters as known and 
are trying to find the proportion, We note that the two 
Proportions pj; and po, are determined from (4.1) and 
(4.3). There is no reason why the remaining equation 
(4.2) must then be satisfied. In fact, given Up and uy 
there is only one for which proportions can be foun 


to satisfy the three equations. 


6 S. Smith Stevens 


Harvard University 


Ratio Scales, Partition Scales 


and Confusion Scales 


Modern efforts to quantify subjective magnitudes have 
led to three types of scales--all three designed to show 
how sensation grows as & function of stimulus input. Tt 
turns out, however, the partition, and confusion 
Scales all measure something different, and when all 
three scales are applied to & particular prothetic con- 
tinuum, the result is & set of inconsistent functions, 
nonlinearly related one to another (Stevens, 195T5; 
Stevens, 1959e)- I lately become clear why 
inconsistency prevails, and why only one of the three 
Scales can be regarded as P proper and desirable measure 
Of subjecti magnitude - n 
On eai prede of subjective continua, called meta- 
thetic, the three types of scales may r 
thetic, the Kn sally equivalent as subjective 
measures. Metathetic con 


ti = relate to 
nua--those that les are pitch, position, inclina- 


Ga. . Examp. 
oe =. and possibly visual saturation 
Goes o 1959)- Metathetic continua are characterized 
ar ig ihe fact that the observer's sensitivity to 
en tends to be constant over the subjective 
scale. 

Seet continua are quite different. They are the 

sensitive continua which relate to how 


quantitative, in 
ted by & grant from the Nat 
This research was suppor md 
Steeg Foundation and contract with the Office of Naval 
Research (PNR Reproduction for any E Eus 
U. S. Government is ERES purp 
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much, and on them the observer's sensitivity to differ- 
ences is good at the low end and poor at the high end of 
the scale. Weber's lay in its linear form, AØ = k 

(Ø + Øo), Benerally holds on prothetic continua, and, on 
the more than two dozen of these continua recently ex- 
plored (Stevens, in press), the law relating apparent 
magnitude V to stimulus magnitude Ø is a power function: 
V= km, The exponent n varies over a tenfold range 
from one sense modality to another. So persistent is 
this finding that I have ventured to call it a general 
psychophysical law (Stevens, 19576). That the psycho- 
Physical law must of necessity have the power-function 
form when both y and Ø are measured on ratio scales has 
been suggested in a penetrating analysis by Luce (195%). 
Confusion Scales 


Under a profusion of different names and variations, 
confusion scales have been more widely studied than any 
other type of Psychological scale. It Started with 
Fechner and his inspiration of October 22, 1850, when he 
conceived the notion that by measuring resolving power-- 
Just noticeable differences (jnd)--he could devise a 
measure of sensation. Having determined the size of the 
jnd, he Proposed to count them off as he proceeded from 
one end of the continuum to the other. The resulting 
scale of jnd is essentially a confusion Scale, in the 
broad sense of the term, because the value of the jnd is 
determined by the confusions an Observer makes between 


stimuli spaced various distances apart on the sensory 
continuum. 


‘taining the 
of this type of meas- 


Thurstone introduced 
novel methods for assessing confusion (e.g., pairea 


comparisons and successive intervals), but the scales 
produced were generally of the same genre as the original 
Fechnerian variety. In other words, they continued to 
be built on the "unitizing" of variability, 

Tf, with Fechner--and with Thurstone in his "Case 
V"--we assume that equal measures of confusion, resoly. 
ing power, discriminal Gispersion, variability, error, 
sensitivity, discrimination, dnd, or whatever one chooses 


Ratio, Partition, and Confusion Scales 5I 


to call it, represent equal distances on a scale of sub- 
jective magnitude, then we have a means of erecting & 
scale. The only difficulty is that on prothetic continua 
this scale does not seem to measure what we would like 
to mean by a subjective magnitude. The outcome would 
mesh better with known facts if, instead of assuming 
that the jnd is constant in subjective size, we were to 
assume that the subjective distance corresponding to & 


given measure of dispersion increases proportional to 
subjective magnitude. (This assumption might be called 
t us at best 


Case VI.) But even this assumption would ge 

only to a logarithmic interval scale (Stevens, 1959b), 

and not to a ratio scale of psychological magnitude. 
Despite Thurstone's explicit acknowledgment (1959, 


p. 15) that his work on subjective measurement followed 
strictly in the Fechnerien tradition, the close relation 
between Thurstonian scaling and Fechnerian psychophysics 
is not generally appreciated (Stevens, 1959c). This 
Stems partly from Thurstone's penchant for novel but 
misleading terminology A stimulus, he says, gives rise 
to a discriminal process (meaning & subjective impression, 
or & sensation). Repeated stimulation gives rise to & 
discriminal dispersion (meaning 9 distribution of sub- 
jective impressions). This distribution is said to be 
"usually normal." considering the distributions of 


the inconsistencies obtained when observers judge which 
er in some respect (paired com- 


of two stimuli is great 
parisons), cei arrived at what he first referred 
to as the "fundamental psychophysical equation," but 
which he later called the "law of comparative judgment." 
In this "law," which he sometimes more appropriately 
called the “equation of comarative judgment," Thurstone 
stated a relation between distance on the psychological 
continuum and discriminal dispersion--& relation that 
makes it possible, given certain assumptions , to derive 
a subjective scale from the distributions of judgments 
of greater or less applied to pairs of stimuli. One set 
of assumptions, called Case V; turns out to be the most 
useful set in practice; and it contains, among other 
that all discriminal dispersions 


things, the assumption ; 

Roe uio Another way of expressing this assumption is 
io say tháb "equally often noticed differences are 
psychologically equal," & statement that Thurstone called 
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(Thurstone, 1959, p. 114). 
to find where Fullerton 


making. The mathemati 
have been refined and 
cated detail. As late: 
unitizing of variabili 
sion, etc., etc., is n 
ment of subjective mag; 
Partition Scales 


The first instan 
have been the cate 


cal models pioneered by Thurstone 
reworked and extended in sophisti- 
T Work has shown, however, the 

ty, confusion, discriminal disper- 
ot a proper road to the measure- 
nitude on prothetic continua. 


ce of a psychological scale seems to 
gory scale of stellar magnitude 
(Stevens, 1958b). Before the days of photometry, men 
looked at the heavens and judged the &pparent brightness 
of the stars on a Scale from 1 to 6, where 1 stands for 
the brightest star and 6 for the faintest. Successive 
numbers on the scale were assigned to successive equal- 
appearing intervals of stellar magnitude. 

We next find a closely related method being used by 
Plateau in the 1850's to partition a brightness interval. 
He asked eight artists to paint a gray that would appear 
midway between black and white. This method of bisection 
was further developed by Plateau's friend Delboeuf (see 
Stevens, 1957b). 

Category scales and bisection scales have much in 
common. Both require the observer to try to partition 
& segment of a continuum into equal intervals. But, as 
we shall see, observers are so constituted that they are 
unable to partition & prothetic continuum without a 
Systematic bias. Under appropriate cire 
servers have demonstrated their ability 
& metathetic continuum, but the protheti 
thwarted all attempts by the standard me 
it into precisely equal intervals. 
answer seems to hinge on the fact that a 
tivity to differences (measured in Subjec 
not uniform over the scale--a fact relate 


umstances, ob- 

to partition 

© continuum has 
thods to divide 

is this so? The 
person's sensi- 
tive units) is 

d to Weber's law. 


5» 
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A given difference that is large and obvious near the low 
Sach ge = range is much less impressive in the upper 
Bee E Scale. This asymmetry in the observer's 
SEH ation of differences produces & systematic bias 
ever he tries to effect partitions on 8 prothetic 


continuum. 
at The bias in the category scale is such that attempts 
SE scaling have sometimes seemed to confirm 
Lo hner's law--the erroneous Jaw that says that psycho- 
gical magnitude grows as the logarithm of the physical 
magnitude. Thurstone himself did at least one such ex- 
Periment (1959, p. 98). Actually, however, when category 
i s the arti- 
imulus spacing--& goal 
that can be accomplished by & process of experimental 
iteration--the category scale is found to be decidedly 
ege ve Louer tuo Auen (eren 
Galanter, 1957). As shown below, the typical category 
Scale is intermediate in form between the jnd, or con- 
fusion, scale and the ratio scale of subjective magni- 
tude. The Munsell scale of neutral grays is 4 category 


Seale that provides another excellent example of this 


fact (Steve 

ns, 19588). M 
It should be mentioned also that two other disadvan- 
tages attach to category scaling. (1) It leads at best 
Only to an interval scale, & scale on which the zero 
Point is arbitrary. (It actually achieves © proper 
interval scale only on metathetic continua.) (2) It is 
highly sensitive to stimulus spacing and stimulus order. 
This second difficulty can perhaps be circumvented by 
experimental ingenuity, put the result on a prothetic 
Continuum will stilt be nonlinearly related to the ratio 
scale of subjective magnitude 
a scale on which the zero 


Ratio Scales 
— M. e is meant 
which ratios therefore have 


io scal 
By a ratio ary and on 


point i arbitr i 
p Ferd ER echnical iy speaking, the scale form remains 
B. multiplication by & positive constant, 
tion results in a loss of 


invariant under Tiz transforma 

and any more general tr i 

Hore TH ae The ratio scale is the most useful variety 

Pul ws qur ui pire, for it contains the interval 

ee eem itself (88 well as the ordinal and the 
In psychological measurement, the 


nominal scalesS/* 
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hod for assigning numbers to 
uch a way that the numbers 
Will preserve information &bout ratios. 

n 1888, who sought to find a 
& given sensation, & variety 
ling have been elaborated. 


Present the ratio.sc 
Classes: (1) magnitude estimation; (2) magnitude 
production; (3) rat 
tion. Each of these 


d, that we can hope to see our 
measures converge on an acceptable result. Since each 


ods probably contains biases 


been described 
elsewhere (Stevens, 1958c), it may be well to say a word 


the nature of 
Some of the procedures. The method calle 
estimation has lately come to be the most 
for it is simple and straightforward, and 
sults that seem to stand up under cross 
tion. 

Magnitude estimation refers to a Procedure in which 
the observer makes direct numerical estimations of a 
series of subjective impressions. As typically used, 
the experimenter presents a stimulus of medium intensity 
and tells the observer to call it 10. The Observer is 
told that a series of intensities Will be Biven, and 
that to each of them he should assign a number Propor- 
tional to the apparent magnitude, as h 


9 perceives it. 
He should use any numbers that Seem appropriate, nci. 


widely used, 
it gives re- 
-modality valida- 
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If & stimulus seems to 


numbers, fractions, or decimals. 
iginal standard, he 


se times as great as the or 
t uld call it 50, if seven times 85 great, 70, if & 
twentieth as great, 0.5, etc. The standard, called 10, 
is usually presented only at the beginning of the series, 
and each stimulus is usually presented twice in irregular 
order. A different order is used for each observer, and 
the results (usually for 10 or more observers) are 
averaged by taking either a geometric mean or a median, 
or preferably both. 

Again, however, caution must be urged against reli- 
ance on one version of one method. Just as 9 chemist 
must at least interchange the weights in the pans of his 
balance in order to guard against the possibility of a 
systematic error, so must the psychologist undertake 
checks and controls in any serious effort at measure- 
ment. The principle is the same, but the psychologist 's 
task may be more difficult. 


Kinds of Scales 
on prothetic continua the magnitude 


(ratio) scale, the category scale, and the jnd scale 
take cn eystcuobically Gifferehb forni The magnitude 
Scale is a power function of the stimulus; the jnd scale 


usually approximates & Logarithmic function; and the 


cate ds to assume an intermediate form, 
er condary factors. Over the various 


depend many Se 
Beien en , these relations among the three kinds 
of scales are astonishingly invariant. Typical examples 
from representative continua are shown in Figures 6-1 to 
6-4. The data are for judgments of apparent duration 
en 19576), apparent thickness (finger span) 
Stevens & Stone, 1959), vibration on the finger tip 
(Stevens, 19594), and loudness (Stevens, 1957). 

On these four CO magnitude scele is & power 
function with exP á ollows: duration, 1.1; 
finger span, 1-33 ipration, 0.95; and loudness, 0.6 
(re sound pressures or 0.3 re sound energy). 
wor each of these four prothetic continua the jnà AØ 
is related to stimulus magnitude Ø by 


ap = k(Ø + Ø) 


If it were not for the term go, 


Examples of the Three 
As already noted, 


where (o is 2 constant. 
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FIGURE 6-1. Jnd Scale, Cate Beale 
. , Gory Scale, and Magnitude Ratio Sci 
for Apparent Duration. : 

Triangles; Mean Judgments of 12 observers who estimated the 
&pparent duration of & white noise. Stimuli were presented in a 
different irregular Order to each observer. Circles: Mean category 
Judgments made by 16 observers on a scale from 1 to 7. The end 


esented at the outset to indicate the range, and each 
each duration twice in & random order. Stimulus 
Spacing was adjusted to produce a "pure" category scale. The dashed 
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e e obtained by counting off jnà would be a 
Ger E function of Ø. Actually, since Øo 
ante vely small, it affects the form of the jnd scale 
pies ee aa Note especially the semi-logarithmic 
ae of Figure 6-4 where the jnd scale is seen to be 
a near threshold but nearly straight over the higher 
wae is clearly illustrated in Figures 6-1 to 6-4, the 
EAT eM is & kind of compromise between the magni- 
e ratio scale and the jnà scele. If, on prothetic 
continua, the observers were capable of performing in 
an unbiased manner the partitioning that category scaling 


calls for, the category scale would be a linear trans- 
Obviously it is not. The 


formation of the ratio scale. 
psychological process of partitioning is not itself the 
difficulty here, because many of these same observers 
made unbiased partitionings On metathetic continua. The 
Source of the distorting bias seems to reside in the 
asymmetry of the observer's appreciation of differences 
at the two ends of the prothetic scale. 

Up to this point it may have seemed that the validity 
has been taken freely for granted 
with little support offered for its candidacy. If, as 
seems clear from Figures 6-4, each of the three 
kinds of scales measures & different aspect of sensory 
response, by what justification do we single out one 
kind of scale as the most appropriate measure of subjec- 

is probably & valid 


tive magnitude? Since each scale 
measure of something, the question becomes: Which of 


the several scales seems to measure what we would like 


to mean by subjective magnitude? Which scale best 
describes how & sensory impression grows with stimulus 


input? 

For a simple continuum like apparent duration (Figure 

6-1) the answer W rather obvious to most 

people. The results of magnitude estimation show that 

a stimulus lastin& two seconds appears psychologically 

to be about half as ol timulus lasting four 
seems eminently reasonable. The 


seconds, a fact that 
jna scale, on the other hand, suggests that a stimulus 


lasting less than one second shoulä appear to be half 
as long as one lasting four seconds. To most observers 
this will seem eminently unreasonable. The c&tegory 
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FIGURE 6-5. Jna Scale, Category Scale, and Magnitude Ratio Scale 


for Apparent Intensity of 60-Cycle Vibration Applied to the Middle 
Finger, 


For details, see Stevens (19594). 
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FIGURE 6-4. Jnà Scale, Category Scale, and Magnitude Ratio Scale 


Triangles: 
median judgments by 70 observers who made magnitude estimations se 


common modulus 
st level presented, The jnà 


Note that, 
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scale, of course, says nothing about ratios, and there- 


fore it cannot be judged by this criterion. 

The foregoing argument based on "yeasonableness" will 
not necessarily appeal to all scientists, especially those 
who already feel uneasy about asking observers to make 
subjective numerical estimates. For this reason, and also 
because it is important to verify the dynamic input-output 
operating characteristics of the sensory transducers, new 
methods of checking the power-function law have been de- 
vised. These methods, involving cross-modality compari- 
Sons, do not force the observer to make numerical esti- 
mates of apparent magnitude, or to assess the numerical 


values of apparent ratios. 


Cross-Modality Validation g 
What Thurstone called “modern psychophysics" was none 
other than the basic methods of classical Fechnerian 
Psychophysics adapted to the measurement of & new con- 
tent--preferences, attitudes, and the like--stimli that 
are not ordinarily measurable on & physical scale. There 
is now emerging a still newer psychophysics--ultremodern, 
must we say?--which is doing just the opposite. It d 
changing the methods, but taking & Div look at the ol 
content. While rejecting the logic of the pies 
School, the reoriented psychophysics attempts , among other 
things, to answer Fechner ' £ central inquiry anc 
the growth of sensation as 9 function of stimulus input. 
lly less complex than the elaborate 
ise ingenious techniques for 
nto a "truly mental unit of 
led it (1959, p. 115). 
catter of data into 


processing variability i 
measurement," 8s 
Instead of tryi 
units of measure; SC 
sensory systems are & 

e resulting 
procedures, Di airectly by cross -modality equations. 
If the relations among the various subjective scales can 
^» e to stand up under direct intercomparison, their 
oe validity seems more firmly assured. 

GUB s modality matching is merely an extension of the 
tching with which psychophysics has 


ubjective m& 
EE Fn Es lier. Equal loudness contours are mapped 


to equate for loudness two t 
asking observers r 
enn HER frequency. Heterochromatic photometry 


60 Psychological Sealing 


they can match, Say, the loudness of 


9 argument can be Outlined as follows. If two con- 
er functions, we may write, 


m 
hei 


n 
GTA 


Then, if cross-modality matches are made, such that Vy 


is equated to Vo, the resulting "equal Sensation" match- 
ing function Will have the form 


ntinua, loudness and 
Pecific experimental 
outcome. Earlier experiments (Stevens, 19594) have shown 
that the subjective scale for (60 cps) applied 
to the finger tip is a power function of the amplitude, 
With an exponent of about 0.95, Loudness, the most 
thoroughly studied continuum of them all, grows as the 
0.6 power of the sound pressure (Stevens; 1955). When 
8t various levels, the 


Power function ith 
exponent given by the ratio of 0.6 to 0.95. wi an 


Shown 
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(Stevens, 1959a). In one experiment the sound was ad- 
justed to match the vibration; in the other the vibration 
was adjusted to match the sound. The circles and the 


Squares in Figure 6-5 show two interesting things. 
lope is about 0.6, 


Together they determine a line whose s 
which is almost exactly the predicted slope (exponent). 
50 T T Sp T d SC 
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FIGURE 6-5. Equal-Sensation Function for Loudness and Vibra- 
tion. 

B es mean that the observers adjusted the vibration to 
match the loudness; circles mean that they adjusted the loud- 
ness to match the vibration. The slope of the line in this 
log-log plot is that predicted by the ratio of the exponents of 
the psychological scales for loudness and vibration. 

circles and squares determine two detect- 
Ern perder lines, showing that it makes a slight dif- 
ference which of the two stimuli is adjusted when a match 
is being made. The situation is analogous to the two re- 
gression lines of & correlation plot, and it points up 
the need for & balanced experimental design in which e&ch 
stimulus is made to Serve as both the standard and the 


variable. 
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ther 
Space does not permit the description of all the o 
cross -modality com 


t 
Parisons that have been made, but le 
US consider one more instructive example. 
Comparisons with Handgrip 

À continuum tha 


UP we can determine (1) how the feeling M 
© relates to the stress exerted, and (2) a 
observers try to report the apparent magn 
tude of other Sensations by squeezing the P dera eee 
This second Procedure is analogous to magnitude estim 


emits 
except that insteaq of emitting numbers the observer 
Squeezes, 


has been studied in several ex- 
periments (J. C. Stevens 


? Sr Be Mack, & S. S. Stevens, in 
Press; J. C. Stevens & SiS Stevens, in press), which 
have produced the results 


Shown in F 
mmediately e 
lons--strai 


igure 6-6. Two fea- 
vident. All the data 


isted in Table 6.1. 


& with numbers" (which is e 

Since the exponen 
T, we may pre- 
nuum in Table 6-1 
Slope of the cor- 
diction is rather 
* The greatest 


is pre, 
in Press) 
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TABLE 6.1. REPRESENTATIVE EXPONENTS OF THE POWER FUNC- 
TIONS RELATING PSYCHOLOGICAL, MAGNITUDE TO 
STIMULUS MAGNITUDE ON PROTHETIC CONTINUA 


Continuum Exponent Stimulus Conditions 
Loudness 0.6 
" 


binaural 
0.55 monaural 
Brightness 0455 59 target 
i 0.5 point source 
Lightness 130 reflectance of gray papers 
Smell 0455 coffee 
» 0.6 heptane 
Taste 0.8 S&ccharine 
i 4.3 sucrose 
i 145 salt 
Temperature 1.0 cold--on arm 
u 1.6 warm--on arm 
Vibration 0.95 60 cps--on finger 
x 0.6 250 cps--on finger 
Duration dik white-noise stimulus 
Repetition rate T light, Sound, touch, anā 
shocks 
Finger span Les 


thickness og wood blocks 
static force on skin 
lifted Weights 

hand dynamometer 


Sound Pressure 
Sounds 


Pressure on palm Lei 
Heaviness 1.45 
Force of handgrip 1.7 


Vocal effort 1.1 Of speech 
Electric shock 355 


60 CPS, through fingers 
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&bout 
the nn en experiments have suggested that 
handgrip y or white noise may be somewhat larger. The 
The leese appears to confirm this suggestion. 
ages one t ral agreement among these experiments encour- 
Scale o subscribe to the basic validity of the ratio 
S of sensory magnitude. 


Variability 

EE oe function is measurable with infinite 

PERRA oe indeed, is any other kind of empirical 

ment aub Variability exists in all empirical measure- 

meng: if Se: & principal goal of the experimental art of 

Some st ion is to narrow the bounds of dispersion. By 

stimul andards, the responses that observers make to 

be t i are highly variable; by other standards, they may 

of egarded as surprisingly consistent. The seriousness 
a given degree of variability is a relative matter. 

es ea and large, the interquartile range of the numerical 

fui mates made when groups of observers undertake magni- 

of e estimation on intensitive continua is of the order 
0.2 to 0.3 log unit. (It may, of course, be lower for 


Continua that are easy to judge.) This variability con- 
nents, however, the most 


oe certain obvious compo 
mportant of which appear to pe the following. 
1. Variability due to the observer's mođulus,i.€e 
standard. To the extent that we 
are concerned only wi of the magnitude scale, 
i f no concern. When 
desired, it can be parti + in one way or another, 
ion in the over-all variability. 

. This component of variability 15 especially evident 
in those experiments in which each observer is allowed 
to choose his own modulus (Stevens, 1956). It also plays 
& prominent role in cross-modelity matches - Each observer, 
for example, has his own conception of what force of 
handgrip matches what level of loudness, but the absolute 
values chosen by irrelevant so far as 
the form of the equal-sens? ction is concerned. 
It is only the relative values that matter. By and large, 
our main concern i$ with the slop 
cepts of these functions: 

2. Variability due to the observer's conception of a 
subjective ratio. In The assessment of relative magni- 
tude, each person must make uP his own mind about what he 
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considers "half as loud," say, anà not all observers 
&rrive at the same conclusion. (A plot showing an exemple 
of the variability encountered in halving and doubling may 
be found elsewhere (Stevens, 19572).) Nothing much can be 
done about this source of variability, except perhaps to 
try to avoid systematic biases and constraints in the con- 
ditions of observation (Stevens, 1956). In the absence 
of systematic errors, our working hypothesis is that this 
type of variability is due to "random" factors, which may 


be canceled out by averaging the results from an unselected 
sample of observers. 


3. Variability 
characteristics. This source of variability is biologi- 
cally the most int 
minor magnitude in a group of "normal" observers. Never- 
theless, in the hard-of-he&ring ear, or the night-blind 


r-function law. 

l always hold 

that no errors arise in 
ook for second-order 
particular experiments 
aw, but in most instances 
er these defections are 


vigorously (provided, of course, 
Our measurements), or should we 1 
deviations from it? The data of 


it is not easy to determine wheth: 
due to artificial biases of one kind or another. Never- 
theless, since the possibility of &enuine departures from 
the power law is a problem of basic moment, an effort 
should be made to devise Procedures of Sufficient accu- 
racy to settle the Coos The fact that the power law 
is closely approximated y SO many da: z 
ferent sense modalities adds i E. EE on any Gir 


nterest 8nd significance 
to any &uthentic departure from the Power-function law. 


7 William McGill 


Columbia University 


The Slope of the Loudness Function: 


A Puzzle 


papers Stevens (cf. 1956) has 
measured by instructing $ to 
The responses are then plotted 
o form & relation that is 


In a series of recent 
shown that loudness can be 
make numerical responses. 
against sound intensities t 


known as the loudness function. 
Whatever else they are, the responses are not numbers. 


They are responses. Stevens' procedure capitalizes on 
our lifetime of commerce with quantification and estima- 
tion. It assumes that & normal subject can use his skill 
in estimation in order to eyaluate an internal event 
quantitatively. Consequently; when the measurement is 
free of bias, the numbers are thought to represent loud- 
nesses more or less accurately.» 

But perhaps normal subjects seldom respond to the 
intensity of sound in all the ways that numbers suggest. 
Tne responses m&y look or sound numerical, &nd they can 
often be plotted, but some sort of demonstration is de- 
manded if we are to believe that they also have quanti- 
tative significance. On the other hand, even an obstinate 
eritic would agree that magnitude estimations of loudness 
behave a bit like numbers. The thing that is not clear 
is how much real quantification is achieved. Conceivably 
magnitude estimations also involve processes allied to 


supported by the United States Air Force 


This research was S 
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learning, Set, and concept formation. There seems to be 


Bood evidence (Garner, 195ha; Garner, 1954b) that the 
latter distort the numbering system but we do not know 
what form these distor 


tions take and how serious they are 
in ordinary loudness measurements. 

The substance of this Paper is that it attempts to 
Show that certain loudness functions are "personal 
equations." They reveal a &ood deal about the process 
of estimation in the individual but not very much about 
loudness, at least in their raw form. However, with some 
effort they can be brought into correspondence with 
Stevens' loudness function, and hence made to reveal 
Something of the Process of distortion. 


Apparatus and Method 


The loudness functions were obtained with 1,000 c.p.5- 
tones at various sound pressure levels. The tones were 


generated by a Hewlett-Packard 200AB oscillator and con- 
trolled in intensity b 


in 1 db Steps. 


-hand end Symbolized "the weakest 
' while the right-hand end 
t tone you could possibly 
ted on a separate line and prior 
The lines were numbered 


rectified so as to 


S used one counter to 
keep his place on the rating Sheet, while E used the 
other to program the attenuator, 


he pencil mark. Two 
rating methods were used, 


Method 1: Stimuli were chosen in 10 ab 


30 db to 100 db. E ran thro 


steps from 
ugh the eight t 


Ones in the 
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Series in &scending order while S listened. During this 
preview the tones were 1.5 seconds in duration and spaced 
two seconds apart. Then E presented the same eight 

intensities in random order now spaced six seconds apart 
and S rated each one. The entire procedure was repeated 


10 times. 

Method 2: Sixty stimuli were chosen in 1 db steps 
from hl db to 100 db sound pressure level. The sequence 
was presented five times, e&ch time in & different 
random order. No previews of any kind were given. All 
tones were 1.5 seconds in duration spaced six seconds 


apart. 


Reproducibility of the Loudness Function 
Loudness functions obtained from different individuals 


using method 1 (described in the previous section) are 
clearly different, although this is not surprising since 
The basic ques- 


the judgments themselves are variable. 
tion is whether the differences between individuals are 
Simply reflections of the "noise level" of the loudness 
judgments, or whether they are reproducible parametric 


differences. 
Figures 7-1 and 7-2 illustrate our findings on this 
point. They show two individual loudness functions and 


redeterminations of the same functions after an interval 
The points on each curve are 


of approximately one week. 

averages of 10 ratings at each intensity. The repro- 

ducibility of these loudness functions seven days or so 

after the original determination is hardly perfect, but 

it is not discouraging either. On the other hand, the 
seem to be quite different, 


individual loudness functions 
at least in the intensity range from 30 to 60 db. 
r that the principal differences 


Accordingly we cen infe 
between subjects &re not random but reflect stable param- 
eters. 

The two loudness functions illustrated are taken from 
12 cases in which we have tried this sort of reproduc- 
tion. The complete data follow the pattern just 
described. The functions are highly stylized but repro- 
ducible. The differences are most evident at low in- 

ttle bit like a signature or 


tensities. They are a 11 
a thumbprint in helping US to identify the subject. 
Loudness Ratings 


Statistical Properties of 
surprising to find stable differ- 


It is by no means 
ences between subjects in situations where they make 
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estimates of magnitude, 
mobile, Garner (195ha) 


Te ever presented. Ten subjects 


edure with the outcome shown 
in Figure 7-3. Intensitie 


db intervals. Consequenti 
7-3 is an average taken over 


the average rating are also indicated by the dashed—line 
envelope around the curve, The judgmental variability 
evidently depends on intensity, increasing ag intensity 
increases. Estimation at low intensi- 


Oudness functions i ingle 
qualification that as the loun with the sing 
flatter, the relation between 


is weaker. Hence it seems that variability 


rather than to intensity. 
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FIGURE 7-4. One of the Loudness Fu 
Replotted on an Arithmetic Loudness sc h“ Shown in Figure 7-3, 
Is the Region Enclosing Plus or Minus One u Dotted Envelope 


Around the Mean Rating. andard Deviation 
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SE foregoing seems to indicate that log loudness will 
o uniform variability. The transformation, however, 
ph quite adequate in most cases. On log loudness 
i dinates the scatter of the judgments at the low 
ntensities seems to expand; just the reverse of the 
condition seen on the arithmetic plot. The few cases 
that are made linear by the log transform have uniform 
variability in log loudness. 
These observations taken together suggest that what- 
ever produces linearity on the log-log plot of the indi- 
vidual functions also produces uniform variability. It 
happens that both conditions can be corrected simulta- 


neously by adding a constant to the ratings given by an 


individual with & curved loudness function, {ees hy 


treating the loudness functions &5 interval scales. 
ess estimation seems to have 


Hence our method of loudn 
Produced interval judgments. The loudness functions, 
compensated for differences in "zero point," are given 


in Figure 7-5, and the individuel curves are now seen to 


be linear. The computed constants represent the displace- 
ment of each S's reference point to the left of our refer- 
ence point. The adjustments are designed to produce 
linearity if the loudness functions are power functions. 
Our method is crude but it leads to & unique adjustment 
for each S, and the constant does straighten out the 
individual loudness functions. The idea is that a power 
function will produce equal loudness ratios corresponding 
to equal decibel differences. For example, 


Lys + L L 

3 al 
Lo + Eo 198 * Yo 
of the reference point, Lys 


is the displacement 
nding to 43 db and the 


ating correspo: 
å analogously.» The constant is 


+ L, 
: d 


where Lo 
is the loudness ri 
other terms are define 


estimated from Los 
oe Leg on - 5 98 — 
o^ Eys + 198 ^ "68 ~ "73 


This quantity is computed for each S and added to the 
ratings shown in Figure 7-3 in order to produce the 
&djusted ratings plotted in Figure 7-5. 
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EE a variation of Thurstone's criterion 
ge one, 1928) may be used to find the individual 
the s nce points. Thurstone defined the zero point as 
e cale value for which variability disappears. His 
ocedure requires a relation between loudness and 
variability similar to that Observed in the present 
experiment and illustrated in Figure 7-6 for two S's 
With very different loudness functions. To find the 
reference point, a linear equation is set up between 
L and op. Then ot, is made arbitrarily small and the 
equation is solved for the corresponding value of L. 
By experimentation we have found that & value of oy = 
produces constants similar to those computed from the 
linearity criterion (Figure 7-5). The loudness func- 
tions adjusted by Thurstone's criterion are shown in 


Figure 7-7. Incidentally, Thurstone's criterion applied 
literally leads to a reversal in the curvature of the 
it "over-corrects." The approxi- 


loudness function, i.e., i 
mate correspondence of the two criteria is the basis for 
s required for linearity 


our statement that the ađjustment 
and for uniform variability are essentially the same. 


Slopes -and Anchors 

Th outline the previous section suggests that our 
individual loudness functions are interval judgments and 
can be converted into power functions by gra & 
zero . However, the slopes of these adjus func- 
Qe Ie x It is hard to 


tions are different and this is a puzzle. 
s changes SO radically 


believe that the sensory proces 
Apparently something else is in- 
volved. This "something" seems to transform the sensory 
process, and the transformation affects both the slope 
of the function and the location of the reference point. 
In Figure 7-8 the nature of the transformation is shown 
more clearly. The estimated displacement of the "zero 
point" is plotted as a function of the slope of the 
resulting loudness function. The constants in Figure 7-8 
are derived from the Thurstone criterion but this is not 
critical. The linearity criterion produces & similar 
relation. In any case the function in Figure T-8 
strongly indicates that the transformation of the loud- 
ness function involves an evaluative mechanism that has 
similar properties for all subjects. It behaves like a 
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FIGURE 7-8. Empirical Relation Betwe 
vidual Loudness Functions. The Esti 
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Loudness Function (Variability Criter Se ee Resulting 
sents One Individual. oint Repre- 


en Two Parameters of Indi- 
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As it stretches, the units 
held up against this 
rubbery units. Our 


on where to put the 
agree about its 


e? that can be stretched. 
vias out. The sensory event i$ 
R har read off in terms of its 
Id nies that most S's agree 

-hand edge of the ruler but they dis 


length. 

The rubber-scale phenomenon was first described by 
Rogers (1941) and McGarvey (1943), who showed that it is 
Produced by an anchoring stimulus. McGarvey demonstrated 
that the subjective evaluation scale stretches to accom- 
modate the anchoring stimulus and that the scale units 
Spread out as the scale 5. 

There is no explicit analogue of the anchoring 
Stimulus in the present experiment, unless it is the 
Silent interval. But the data seem to indicate that the 
differential conditioning we e to differ- 
ential an á 

o (1955 6; Stevens and Galanter, 1957) has 


Stevens (1 3 À 
(1955; E of idence and has added & good 
dness func- 


tion follows a cube ro 
slope of the function is 30 on the 10g-108 coordinates. 
This 1 tion emerges 2S 
oudness func pee gig reference point corres- 
d for evaluating his 


oint use 
ponde WE une eqs he ss function is found to 


ratings, the slope O 
be approximately „30, as Stevens maintains. 


8 Paul F. Lazarsfeld 
Columbia University 
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and Test Theory 


3: The Foundations of test 
owing form. A sample of 


theory c8n pe put in the fol- 
testees has answered & test 


Consisting of n dichotomous items. A score 5 has been 

assigned to any respondent with s correct answers. There 
is, however, assumed to exist & perfect test which would 
assign each testee & "true score" t if he gave t correct 
answers. The score values $ and t form & pivariate dis- 
tribution. The conditi istribution of s given & 

fixed value of + is assumed to 


urement. :mportent purpose © 
nt. One impo: e tee on the t-axis when his 


infer the position of i 
coma de he emer ete re em 
ke some mathematical assumptions 

In some models 


only be done if we ma. T ; 

about the bivariate s-t distribution. ; 

additional assumption? are made relating the a 
ovariance between the 


of t " " stems tO the average ec 
Hee uuo lr Often special attention is 


items actually obserVeC? tributi 
given to the marginal distribution OE ð 
test theory model can essentially be described 


ider&ations: 
a are used? Usually one will 


1) How much (n-1) 
as empirical data used. This 


fina only (pel) + ^ * 
model considered the n+l frequencies 


n if the " 
would happe 1 s- distribution; the n "item difficulties" 


of the margin@ 
rect responses; and the alaa). 


or proportions of cor 
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Covariances between items. 
formation is used, 


2) What axioms Or mathematical assumptions are intro- 
duced into the theory? 
Which characteristics of 
bution can pe deduced from 1) and 2)? 
Questions 2) ana 5) are Obviously related. The vou 
We put into the model under 2), the more will we be ab 1 
given the same amount of empirica 
As a matter of fact, test 
model is really a theory of 
transformation, in this case from the s-axis 
to the t-axis, 


Often much less actual in- 


the bivariate s-t distri- 


* I omit certain variations which come up 


T it is assumed that the "true" and the actual tests 
can consist of a different number of items; this leads 
to the 


ent scores and requires 
further assumptions on the Sampling of items. 


Latent Structure Analysis (LSA ) Can be looked upon as 
a generalization of test theory models, Using the pre- 
ceding three aspects the dir erences are as follows: 

l) LSA starts with the full joint distribution of Se? 
the manifest space ab 
lon of s.scores in 
d by item difficulties 
S 2 pieces of information. 

can be &rranged in & 
latent space which will have much fewer dimensions than 
* The latent axes may 
le 


Xplaineaq presently. 


«acf Manifest data the 
number of model Parameters which can manifest da 


be co uted is very 
a distripy SÉ Bu 


tion of People over 
to th s- 
tribution of "true scores" © marginal di 


i 1 a set of conditional 
distributions which corresp © the error Of cash OS 
ment." However, the LSA model ig Much more Complex and 


the latent space which corresponds 
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an 

oc Be theory model can be derived from it as & special 

Ew shall first elaborate on this overconde 
and then give a concrete example. 


nsed state- 


( ISA is then based on the following ideas: 
a) An intended (latent) continuum is assumed. 


ed A number of dichotomized manifest items are intro- 
uced. Each item, i, has & probability p to be answered 
latent continuum. 


" 

positively" at any point x of the 
(c) the cunotion pt = f(x) 19 caled operating charac- 
teristic (o.c.) in test theory and traceline in ISA. In 
deference to the Princeton host group We shall hereafter 
use their term (o.c.), elthough I prefer mine as more 
descriptive and colloquial. 


(à) A principle of local independence. 
fixed point x the probabilities for joint o 


ce is assumed: At a 
ceurrence are 


the term "the signature 
means the sequence of (+) and (-) responses given in & 
specific case. If a negative response probability is 
designated by & barred index (pi =1 - py) then the 
principle of local independence can be written in the 


form 


1 = ... 
= fy fifjfk ? 
where c is & sequence of indices, P.B 1,8, 5, h Dote 
etc. 
(f) Equation (1) is the 0-C- (traceline) of the response 
pattern c. It indicates the probability of getting & 
response pattern of signature 9 at any point of the 

set of items which satisfies the 

be called a pure test. 


latent continuum. 
endence can 
ere correspond recruit- 


W b stern c th 
To any response patter ; s tÆ 
=. T BURCH E This is & function which indicates 
abilities that the response o was given 


the reversed prob 

by & respondent at 

AN 
shall think of it as unidimen- 


lFor the sake of simplicity we 
sional. This restriction Will be removed in Section T. 


point X. 
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Of people at point x. 


š nO assumption is made that Ø(x) be 
normal. 
(i) The reverseq probabilities are represented by the 
function 
B Ty OO du) 28 
x Se 7 

fra * B(x) dx By 

Obviousı 


SPOUSE Gs ‚Bub Pe and Ø are unknown Ve: 
ttern we can Bive scores, by a 
for instance, 


rage) recruitment place. Or it 
is, the point x at 
» has a maximum. 


LoBlize the implications of 
the three Points just made. STY response pattern can 
come from anywhere on the latent continuum but with dif- 
ferent probabilities, » to each g there cor- 
responds a complete distribution of reverseq Probabili- 
ties. A Score is some kind of &verag 
3 which 
Scores can 
S, including a 


Section 4 


Against this general back&roung, LSA Proceeds in the 
following steps: 
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raically specified 


(a 
) It investigates models with algeb: 
efficients 


oe 
bok wee e.g., polynomials with undetermined co! 
* Sumed degree. This distribution might be alge- 
raicall S 1 a b 
S HS = prescribed [e.g., PX) = F] " (1x); 
fel 12 1,] or left completely free. m 
This SA then develops the conditions of reducibility. 
in Ren the restrictions imposed upon manifest data 
(c) Ca & chosen model were applicable: 
coula „given set of data are examined t 
Probl reasonably" come from the chosen model. 
(à) ems will not be discussed in this paper. 
ete If the chosen model is eligible, its latent param- 
( TS are computed. 
e) If not, a new model is tested. 
To give a very simple example we pick & test of work 


Satisfaction, answered by 876 employees of a company 
(see Table 8-1). We now 80 through the four steps just 


mentioned. 
fa) We chose linear trat 
ribution ¢(x) is free. 
x) = O wherever fj (x) 
(5) One of the reducibility conditions of t 
as follows. The matrix O pressions Pij 7 P4P3 = 
other conditions are that 


[13] has to have rank one: 


Dee 

eið > Perg pe the same for & given k 
Pij - PiP) 

the choice of i and j: 

(c) The matrix of [ig] has & reaso 
rank 1. The ori 
fitted matrix are & 


model the latent par 
data. 


and first order 
worry about higher order data. 
4a, =P and b. = [ellin] 
own that 94 " " T 


(a) It can be sh 
f wise invariant against changes of g and h. 
which is model compute the first two moments and 


cannot 
For (x) w? set the mean of the distribution 


arbitrarily 
therefore 977.7 variance as one. The third moment turns 


at zero an 
out to be 


o see whether they 
(Sampling 


= aj + bjx. The dis- 


elines fi 
however, that 


It is required, 
£0 or £4(X) zl 
his model is 


irrespective of 


nably good fit to 

e residuals from & 
-2 and 8-3. In this 
be computed from second 
&t first do not 
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Kre S Pe Um p 
(4) m, = idk äm, a He 
5 PR b 


.b.D "E 
TK BUl.d;k E 


has to have the same value whatever three 


d 
Liens i, j, d We choose; this is the reducibility condi- 
tion mentioned before but 


My = 1.57, M = 5.47. We shall discuss thése values 


TABLE 8-1 


Question Positive 


Response 


l. Are there any things about your "A lot of 


-3h 
job that you particularly like? things" 
2. Are there any things about your "None" and .5T 
Job that you particularly "not many" 
dislike? 
3. How often do you look forward "Every day" 
with some pleasure to your day and "almost .62 
on the job? every day" 
lh. If someone asked you about getting 
& job like yours, which of the it 
following would you be inclineq “Encourage -48 
to do? Encourage her? Dis. her 
courage her? Neither? 
2. Do you ever feel you would like 
to quit and get a job with some Never" EI 
other company? 


6. Do you feei that you woula like 
to get a transfer from your Seldom" 
present job to some Other king and "never" +58 
of job in your department? 
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CROSS PRODUCT MATRIX 


l à 3 4 5 6 

1 Zen 
--- OL 062 069 057 029 
E oli EM 080 088 077 050 
i 062 080 --- 10T 088 054 
5 069 088 107 Boe 103 061 
G 057 OTT 088 105 --- 058 
061 058 a 


TABLE 8-5. 
1 2 3 4 5 6 
I = -006 005 005 002 -00l 
2 -006 el 002 -001 001 005 
3 005 002 Br -001 -005 -002 
h 005 -001 -001 ie -002 -002 
2 002 001 -00 -002 --- 005 
-002 003 ores 


the fit of the model we compute 
cies as they derive from the model 


(e) To get an idea of 
e manifest data (see Table 8-h). 


the fifth order frequen 
and compare them with th 
The results are not too pad. 


TABLE 8-h. ACTUAL AND FITTED JOINT FRE 
OF ITEM QUINTUPLETS 


Joint Frequencies 


QUENCIES 


Ape Ps 
Combination Actual Joint 
ur we or 
12345 let .129 
12346 .132 BEN 
12556 .115 .115 
12456 .112 105 
15h56 .119 EET 
Th 178 


23456 
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To get a 


items we dra 


graphic picture of t 
w Figure 8.1. 


he o.c.'s for the six 


ee discrete located 
classes. We find that 99.9% of al 
equally at the points x 


while Ms; is quite high. 
would be to find the 
moments. 


Section 5 


e, nal s theory a Person's score is the 
Cie oet bye Lgs Such a score has 
erive simp], S 

calculus if the o.c. Of each Single item senility Sa 
three items, for example, the Score 2 Bsa a Se 3 
from the following patterns; e obtaine 
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SE +-+ cube g 


The probabili 
ility for a respondent to return 
ee = eine p any of these 


(5) tm BRE, GE AE 


Wi 
SE e sum is taken over all possible combinations of 
(1946) ee items. This point has been made by Tucker 
is » Lord (1953b), and others, to which I shall refer 
e "Princeton tradition." 
* aes it is clear that & manifest test score has an 
.c. of the type of equation (5) all the rest of the 


Statements in Sec. 2 apply automatically. Especially 
fest test score s 


endi is the fact that to a mani 
ere correspond & whole distribution of latent recruit- 
ment probabilities. A latent (or, if you please, true 
Score can only be established after we have decided 
whether we mean by it the e the most likely, or 


another "average" latent position. 
Suppose, to fix our ideas, we choose the expected 

latent position of people who have a manifest score s. 

We can then raise the question of hows is mathematically 

related to this "true" score. This will of course be 

different for each model and is usually & very involved 

mathematical form. Even for such a simple model as the 

long to write down. 


one with linear o.c.'s it is too 
ea of the numerical relations, we have 


Just to give an id 

chosen two items from our example and have graphed the 
o.c.'s for the four response patterns ++, Haig ky oH 
The graph gives the o.c-'S for the four response patterns 
and the latent scores- 


xpected, 


ug = .835; ME = „3: We = -.h55; Hig = -.859 . 
an (expected) values for the reversed 
They are the "P £ C9 (x) 
ion, Ý (x) = . The latter 


ability distribut 
wand. De fully aphed because we know only & few 
moments of G(x) - (See Figure 8-2.) 
To compute the latent position for the traditional 
nave to combine for s = 1 the pattern +- 


test score We 
und. whe It can be easily seen that this is 


prob 
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FIGURE 8-2, 


Puethe + Pug TT 
E OE PUER es E 
= Phó + Pig 
Obviously 
Se = Wie = Dan 


o = Hgg = --8559 . 
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Whi i à 
= le in this case the distances (s2 - s1) and (si - so) 
e ois 
deed d similar, they are not equal. In more complex 
tions a Pearson correlation between traditional 


Score and latent score would be far from one. 


For each test score there exists & recruitment prob&- 
of which can be exam- 


pee function Y, the properties 

SN For example, one may square out the values of 

Sx yz (x)ax and so decide which score has the highest 
Eoy to locate a person within a given interval 
SE the expected position, which could well be called 
he reliability of this specific score 5. 


Section 6 


When I developed the ics of LSA during the 


war, I was not aware of the paper by Tucker (1946). I 
have since come to the conviction that as far as applica- 
tion to test theory goes there &re great similarities 
betuecs THA and the Princeton tradition. The differ- 
ences consist mainly in & broader algebraic formulation 
of LSA, which does not require normality for f(x) and 
permits the investigation of any kind of operating char- 
acteristics (tracelines). A collateral advantage is & 
greater emphasis on the reverse probability functions 
y(x), which should be studied in their own right. 

I see only one disagreemen ht be based on & 


t which mig 
misunderstanding. In the Princeton tradition the latent 
X of a given item 


probabilities př i i are not directly 
nuum. 


mathemat 


linked to the underlying cont for each item an 
additional latent element Z4 is introduced; this, in turn, 
is correlated with the latent continuum, x. There are, 
so to say, n+l latent variables, one for each of the n 
items and one for the intended general classification. 
These n specific variables Z seem to me mere ghosts. I 
have the impression that all papers in the Princeton tra- 
d by defining directly for &ny 


dition could be simplifie 
item i 


= x) dx 

p, = [£ 060969 & > 
without going through the n additional latent abilities 

Zi which determine whether & person can or cannot answer 
the item i correctly. But I raise that more &s & question 
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because I might misunderstand the purpose of the Princeton 
assumption. 


Section 7 


Polynomial o.c.'s of any degree and the moments of a 
free g(x) can be completely determined. This permits in 
one special case to exemplify the problem of latent 
dimensionality. Suppose we want to decide for a special 
set of data whether the appropriate model is unidimen- 
sional with quadratic tracelines: 


2 
(Model A) £, (x) =a, +d, xX + C; X , 


or two-dimensional with a linear trace surface: 


(Model B) f (u,v) = a; + bro: 


Where now u and v are two latent continua. 

It can be shown that the two models have a large 
number of reducibility conditions in common. For example, 
the matrix of cross-products in both cases has rank 2. 
Certain ratios of determinants formed from joint fre- 
quencies on various levels are constant and equal for 
both levels. Only if one forms a so-called ascending 
determinant which includes fourth order frequencies does 
a difference appear. A typical example is as follows: 


* By P2 Pa Ps Pos 
Py Py . R 2 Posh 
Ps E . 

A= ! 
Pis H . 
P6 + P2346 
Pe Dee * = . P2356 


Such a determinant vanishes in the case of the one- 
dimensional quadratic model but does not vanish if the 
appropriate model has two latent dimensions and a linear 
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trace surface. Obviously the sampling problem of such a 
decision is very serious. Professor T. W. Anderson, of 
Columbia University, and one of his students have made 

progress on the sampling aspect but it will not be dis- 


cussed here. 


9 Frederic M. Lord 


Educational Testing Service 
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"apility underlying the test," or 
more briefly, the "ability scale" or "ability score," was 
Mure rief V^ peycnometric Monograph No. 7 (Lord, 1955). 
This scale is the same as that which Professor Lazarsfeld 
has more descriptively called the Cine Contini 
Professor Lazarsfeld's earlier work wee. Bree ty v 
attitude tests and with questionnaires of various types-- 
a particularly difficult area, it seems to me, because 
the trace lines of test items in this area will not in 
general assume any single, simple Gre in me rm 
aptitude and achievement tests, On the pines wann 
empirical results show that the: asmupttor a 
ogive trace line gives & good fit for free-response - 
items; & slightly less simple trece line should provide 
B SSC fit for multiple-choice items. In the special 
ue mero the brace Liner Sn normai e ee 
abi it i normally distributed in the group tested, the 
Ln y le or latent continuum is the same as the 
ability sc@ of the inter-item tetrachoric correlations. 
common factor OF ing the ability score of an individual 
Ë assuming & normal distribution of ability 
tA c been worked out by UE e more especially 
(19518; 1953; 1955), who e an empirical study 
ns method to actual test data. Recently Profes- 
applyin& irnbeum (19578; 1957b; 1957c) has derived some 
s along this line and has simplified 


A scale called the 


has served in part ai 
material treated here D s a basis for 
The ons (Lorà 1959e; Lord, in press, a). 
9T 


other publicett 
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Today my Subject concerns true Scores, not ability 


Scores. These are frequently confused--and not unnatu- 
rally, since true score and ability score are perfectly 
correlated. However, they have a curvilinear, not & 
rectilinear, relationship. This fact is immediately 
apparent when it is considered that the ability scale 
extends from -oo to +, whereas the true-score scale 
is bounded above and below, at least for any ordinary 
If the examinee's observed 
answers, for example, then 
and the true-score scale 


The examinee's 
the avera, 


same true-score scale, However, 
perfect curvilinear corre- 


difficulties, for 


stretch the true-score scale 
that of the other. 


One vocabulary tes 
vocabulary test. 


On this scale no 
administered. Th: 
achieved, however 
about the nature of the tes 
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Observed score is, by definition, the error of measure- 
ment. It is assumed that the error of measurement is 
normally distributed with & mean of zero and a variance 
that is experimentally determinable, e.&., by administer- 
ing two parallel forms of the same test to the same indi- 
viduals. In particular, the frequency distribution of 
the errors of measurement is assumed to be independent 
of the true score of the examinee. Given these assump- 
tions and an experimentally determined value for the 
variance of the errors of measurement, it is easy to 
estimate the correlation between Observed score and true 
Score, and further the linear regression coefficient of 
true score on observed score. The classical procedure 
for estimating the true score of & given examinee is 
then simply to substitute the examinee's observed score 
into the linear regression equation for predicting true 
Score from observed score. 
Although this procedure may be quite adequate in many 
practical situations, it is open to two serious theo- 
retical objections. In the first place, although the 
regression of observed score on true score must neces- 
sarily be linear, there can be no possible reason why 
the other regression--the regression of true score on 
Observed score--must be linear. It is this latter 
regression that is the one required. To the extent that 
the frequency distribution of true scores in the group 
tested deviates from normality, to that extent the regres- 
sion of true score on Observed score may deviate from 
linearity. In extreme cases, this regression may become 
sharply nonlinear, 8$ illustrated in Figure 9-1, which 
is intended to represent & rather peculiar group of 
examinees half of whom have true scores at &, and half 
at tj. The frequency distribution of the Observed 
Scores (denoted by x) for each half of the group is 
represented in the figure by an approximately normal 
curve, which must be visualized as perpendicular to the 
page. Since the average error of measurement is zero, 
the mean of each normal curve lies on the 45-degree 
broken line, which is therefore the regression of observed 
Score on true score. It is graphically evident from the 
figure that the other regression--the regression of true 
Score on observed score--must look like the heavy ogive- 
shaped line and must be very sharply curvilinear. This 
ogive-shaped line is the regression needed for estimating 
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the true Score of a given examinee (if a regression 
line is to be used at all). 


Se 


Regression 


g 


T § 


(heavy line) of True Score (£) 
core (x) When True Sco: 


FIGURE 9-1, 
On Obtained g 


re Has a Dichotomous 
Frequency Distribution. 


y : Trors of measurement will 
etrically distributed and will not be dis- 
tributed ind 


mating the frequency 
; res, and 
a estimating th dual exam- 
nees, ons just raised 
Only a minimum 


requires 
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Let x„a be the observed score of examinee a on test u. 
Let ua denote the corresponding true score. Tne error 
of measurement is, by definition, 


G) e > É 


The only assumption to be made &bout the errors of meas- 
urement is that they are chance variables, each with & 
mean value that is always zero, irrespective of the 
yalue of the true score, or of the value of any other 
error of measurement. 

It is next assumed that we have available several 
parallel forms of the same test--parallel in the sense 
that the true score of any examinee, and also the joint 
frequency distribution of his errors of measurement, 
remain the same no matter which forms of the test are 
administered. 

Starting with only these two assumptions, it is pos- 
sible to obtain good estimates of the shape of the 
frequency distribution of true scores, and also of the 
true score of any given examinee. The shape of the 
joint frequency distribution of true scores and errors 
Of measurement can also be estimated by this same pro- 
cedure. Note that no assumption has been made &bout the 
Shape of the frequency distribution of errors of measure- 
ment, nor has it been assumed that the shape of this 
distribution is independent of the value of the true 
score. Another interesting point is that the term "true 
score" has not been defined for purposes of the present 
discussion. Whatever can be deduced &bout true scores 
from these assumptions follows solely from the fact that 
the true score is the difference between the observed 
Score and the unknown error of measurement. This approach 
may possibly prove useful in avoiding confusing discus- 
sion about the existence and nature of true scores and 
about the impossibility of ever &scertaining them opera- 


ti G 
—€— ilable to the practical worker are the 


The data ava i 
moments of the distributions of the Observed scores on 
the various parallel tests. It will be assumed through- 


out that the number of examinees in the group tested is 
so large that sampling fluctuations due to sampling of 
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examinees can be ignored. The following results will at 
first be very familiar. 


The expected value (that is, 
the average value over al 


1 tests) of the observed score 
of examinee a is hig true score: 


(2) Ex = E(t, + e) 


Since the expected error of measurement is zero and since 
All tests have the same true Score. The first moment of 
the observed Scores of all examinees on test u is 

N 
T = x ang the expected value of this first moment 

gel Hä 
over all tests is 


l Ed 

(3) TRE ies oat 
ee 

ar 


= t, 
the mean of the 


tested, AS scores of the group of examinees 
Similarly, for 


(4) aros 2i 


2 
a nu $e) 
=i 2 
Ho, + ee Be 
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where Mo is the second moment of the true scores in the 
group of examinees tested and Moo is the second moment 
of the errors of measurement in the same group. If the 
origin of measurement were chosen at the mean of the 
true scores, so that all these moments became moments 
about the mean, then (4) is essentially the familiar 
equation stating that the variance of observed scores is 
equal to the variance of the true scores plus the vari- 
ance of the errors of measurement. The expected value 
of the first bivariate moment (the sum of the cross- 
products) between the observed scores on two parallel 


tests, u and v, is, finally, 


al 
(5) BEE x a” EG eala a) 
a 


2 
= "ke HAS - Susbs 7 Svaba + Saa va) 


Equation (3) shows that the mean Observed score is an 
unbiased estimate of the first moment of the true score. 
Equation (5) shows that the first bivariate moment of the 
Observed scores on two parallel tests is an unbiased 
estimate of the second moment of the true scores. It is 
therefore not surprising to learn that the first tri- 
variate moment of the Observed scores on three parallel 
tests is an unbiased estimate of the third moment of the 


true scores: 


å aM e 
(6 E T? Xue va va Mio 
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As a matter of fact, it is quite Obvious that in general 
the first k-variate moment of the observed scores on k " 
Parallel tests is an unbiased estimate of the k-th momen 
of the true Scores, 


À similar Situation 
errors of Measurement a 
“rue score. Once we 


as a good approxima-. 
-score distribution, If more than four 
moments can be Obtained, then these can be used to repre- 
Sent the distribution in the form of & Charlier series 


or an Edgeworth series, or more simply, as a histogram 
with k+l class intervals. 


Table 9-1 shows four estimated true-score distribu- 
tions together with the corresponding observed -score 
distributions. The estimated true-score distributions 
are actually beta distributions Pitted to the first four 
estimated true-score moments. ‘The estimated distribu- 

t what would be expected, ana they will 
not be discussed further. 

The estimatea 


Tt 
determine from these for example 
the correlation between the + 


» the sign of 

Tue score &nd the variance 

of the errors of measurement, This correlati 
negative for group H and Positive 


for group 1, These 
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TABLE 9-1. ESTIMATED FREQUENCY DISTRIBUTION OF TRUE 
SCORES, TOGETHER WITH THE OBSERVED SCORE DISTRIBUTION, 
FOR FOUR GROUPS OF EXAMINEES! 


bone SS SS EE 
Group H Group M Group L 


Ben os m e "S o "B m T5 

?2 4 51 

21 9 150 7 ab 

20 32 12 202 267 5 

19 52 38 202 290 16 d 

18 65 6T 151 198 AL 9 i 

1T 85 92 109 105 5h — 55 1 

16 105 108 65 46 9s. TT 2 

15 98 115 35 17 118 124 5 1 

14 93 115 20 6 119 162 9 3 

15 95 107 9 a 130 177 16 6 

12 86 ah 5 i18 165 29 15 

11 76 78 Ý 114 126 lið 31 

10 65 61 80 7 72 58 
9 L7 45 26. 350 . 9e. 98 
8 32 31 34 10 114 159 
7 22 19 19 L 124 173 
6 19 EE T 130 179 
5 10 5 4 114 149 
if 6 2 T 109 EU 
5 5 69 4 
2 2 H3. lal 
1 iT ib 
o 5 


1000 1000 1000 1002 1000 1001 1000 998 


to be expected if the variance of the 
becomes small as the true score 
possible score or as the true score 


are the results 
errors of measurement 
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core of a single individual 
whose observed Score is known. Only the least-squares 
Procedure will be mentioned here, 
If the 


For example, if the regression 
Proximated by a thi 
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estimated moments of the joint frequency distribution of 
true score and error, as illustrated in equation (9): 


5 Ð 
(9) Bex —zt(&* e,) 


set 4 amis + arge + EE e 
a aa aa aa 


N(Mjo + Mo + Mja) + 


In equation (9), Mj, is the fourth moment of the true- 


score distribution, Mo and My are bivariate moments 
of the joint distribution of true score and error. It 
may be noted in passing that the bivariate moments 
between true score and error will not in general be 
equal to zero because of the fact that the error distri- 
bution is not independent of true score. 

By means of other equations like (9), the numerical 
values that go on the left side of equation (8) can be 
obtained. These equations are then solved to determine 
the coefficients that belong in equation (7). Equation 
(7), finally, can be used to predict the true score of 
any examinee in the group tested from his observed score. 

In elosing, let me point out again that these results 
have been obtained from just two assumptions: (i) that 

s an expected value of zero, 


each error of measurement ha 
f the true score or of other 


no matter what the value © 

errors of measurement; (ii) that several parallel forms 

of the test (or parallel fractions of the test) can be 
tered. 


constructed and adminis 
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Measurement of Utility 


and Subjective Probability 


e make decisions in risky or 


uncertain situations have come to focus on two concepts: 
utility, or subjective value, and subjective probability. 
The status of the utility concept is fairly clearly 
defined (Edwards, 1954c). But the status of the subjec- 
tive probability concept is quite confused, both theo- 
retically and in relation to experiments. 

Should the subjective 


The focal question is this: 
mutually exclusive events, one 


probabilities of & set of 

Of which must happen, add up to one? The answer yes has 
distressing consequences; SO does the answer no. This 
paper examines these sets of consequences, and attempts 
to evaluate both answers in the light of the available 
data. 


Maximization and HE, 

First, some definitions and assumptions. This paper 

is concerned with decisions of the following kind. A 

decision maker can choose one and only one of two or more 

courses of action. Associated with each course of action 

is & finite set of possible outcomes, one and only one of 
ourse of action is chosen. 


which will happen if that c 
Each possible outcome can be described by a number, called 


a utility, defined at least up to & linear transformation 


mec Muir 
The work reported here w&8 partially supported by the Willow 

Run Laboratories, University of Michigan; and partially by the 

Air Force Personnel and Training Research Center, Air Research 


and Development Command + 


Theories about how peopl 
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(i.e., measured at least on an interval scale); this 
number fully represents the 


desirability of that d 
to the decision maker. If more than One possible E 
is associated with & given course of action, then a we 


ision 
On-making situations. De 
lon maker has partial con 


t 
<8., a man shooting at a targe 
Ones which May not fit. 


me. Dreze (1958) has Written a sophisti- 
cated account of h 


s and other compli- 
an be included Within simple decision 
theories like thos 


: at? Four kinds of quantities which 
might be maximized are defined by the following four equa- 
ons: 
(1) BX ED Ye e 
X 
(2) EUR. 


These equations each refer toa Course d > 

& number of possible outcomes. The ith pectton which = 

has a money value Vi, to which in equations (2) < wae 

corresponds a subjective value or utility, EM a Ð i 
. c 
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will occur if a given external event occurs; that event 
may or may not have an objective probability, Pi- Whether 
or not the external event has an objective probability, 
models (3) and (h) assert that it has a subjective proba- 
bility V4. 
All four equations take & set of numbers representing 
values associated with a given course of action, multiply 
each by its associated probability, and sum the products. 
Each is the basis for & model which asserts that the 
course of action which will (or should, depending on how 
you interpret the model) be chosen is the one for which 
the resulting number is largest. Equation (1) defines 
the concept of expected value (EV); in many gambling 
situations maximization of EV also maximizes long-run 
money gains (or minimizes losses). Equation (2) defines 
expected utility (EU), the EU maximization hypothesis 
Was originally proposed by Daniel Bernoulli (1738) and 
was revived in 1944 by Von Neumann and Morgenstern (1947; 
see also Edwards, 195hc). This revival started the 


interest in decision theory which has been growing among 
economists and psychologists ever since. Equation (3) 
defines a concept which I have named subjectively expected 
value (SEV); so far as I know, its only serious use has 
been by Preston and Baratta (1958). Equation (4) defines 
& concept which I have named (Edwards, 1955) subjectively 


expected utility (SEU); the remainder of this paper dis- 
cusses various versions of & model in which SEU is maxi- 


mized. 
From here on, this re oe 

The d "maximization" will be 

Ts doni The concept of objective 


" $ " 
the SEU maximization model. 
probability will be symbolizeð OP; SP will stand for 
subjective probability. The concept of & set of events 


e must happen and not more than one can 
I de the concept of an Exclusive , Exhaustive 
set of Events--will be symbolized EEE in text. This paper 
is concerned with two classes of SEU models. Those in 
which the SPs of an must add up to & constant will 
from here on be called ASEU (additive subjectively expected 
utility maximization) models. Those in which the SPs of 
an EEE do not need to add up to anything in particular 
(though of course the OPs of an EEE must always add up to 
one) will be called NASEU (non-additive subjectively ex- 
pected utility maximization) models. 


use some abbreviations. 
omitted from phrases like 
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N x! =x, This paper will eas 
than the mathematical names 
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f. Becker & Siegel, 
Siegel, 1956; 
ollow are 


©, however 8PPlieable to models 
like Coombs ' , which assume y RE continuous utility 
» but piPerimenta]], T 
order Properties 
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of a finite Subset of Points from these 
functions Coombs & Beardslee, 1954 ) 
One of the uses to which ag 


models can p is that 
of serving ag & basis for Probability theo SCH 
Savage (1954) nag à 4 


L. J. 
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Probabilities which 
ith SPs, S paper will 


some events » like the 
have well-establisheg ir 
can meaningfully be compared w 
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Permissible Transformations on Utilities and SPs in ASEU 
&nd NASEU Models 

Most measures have some degree of arbitrariness; they 
permit various transformations which do not affect the 
meaning of the scale values obtained. Consideration of 
permissible transformations in ASEU and NASEU models 
leads to = 

Theorem 1: In any ASEU model, utility must be con- 
ceived of as measurable at least on an interval scale 
and SP mst be conceived of as measurable at least on & 
ratio scale. In any NASEU model, utility and SP mst 
both be conceived of as measurable at least on ratio 


scales. 
The proof of this 
if a decision maker C 


theorem begins with the fact that 
hooses course of action Ain 


preference to course of action B, this means that the SEU 


of A is higher than the SEU of B. The class of permis- 
sible transformations on the measures which enter into 
the SEUs is defined by the fact that such transformations 
may not make the SEU of B higher than that of A. Tradi- 
tionally, utilities are measured on an interval scale. 
Assume that SPs are measured on & ratio scale and trans- 
late the content of this paragraph into symbols. 

The permissible transformations are 


(Sy acantha Vr ors 


where a, b, and c are constants, and & and c are greater 


than zero. 

The fact that permissible transformations may not 
reverse the preference of A over B means that 
6 SE Wang E Vy aa > E Va age 
(6) zu, "sa oa Fi o ape ap 


ta HMM 


g the transformations in equation (5) into 


Substitutin, 
cse quation (6) produces 


the second half of © 
y me quil TE Me 
iA 


Multiplying and collecting terms, 
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U values, at difference is guaranteed, 
Positive, since the definitions of A and B say th 
Ais Preferred to B. Consequently, the first term of 
inequality (8) must be po itive. 

Now Consider the 


If the quantity inside 
any value different from zero, a value 
of the constant b (which may 


© negati an be found 
which will reverse the directi ative) can 
SO violate the 


5 on of the inequality and 
© requirement expressed in inequality (6). a 
Something ig Wrong. What? A Closer look at the secon 
term of inequality (8) Shows 
brackets is the differe 


ei ds e quantity oon 

ce between ty h o 
which is the Sum of th SPs o EER. B k OPS; 
Se numbers would be 7 


and so the difference 
be zer » the Seco, á 
nd te d dis- 
appear, and the Problem would be Solved. S es ty 
Property for SPs is @Ssumeq ina SEU x 
then the sum of t i 


maximization model, 
8 also the S8me constant in 
15 Be drops out and the 
8 Of an EEE do not 
nt, then the 

the second term equal zero ii to require War to om 
is, to require that utility be meas: =: Sa 
S e IM ich var Miring that the 
constant b take on] É which make the Sed F 
inequality (8) Positive. But SEU mode y Cond term 
ence. If a decider is indi 


ifferent betwe Permit indiffer- 
inequalities (6) ana (8) becom 


i and B, then 
A me SMAlities. n that case, 
permissible transformations like those Proposes t 
tion (5) must not change the equalities = in equa- 
This can be assured only if the 


o inequalities. 
Second term in H 
in ty 
) is always zero.) The two ways of making the IT 
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term of inequality (8) zero corresponà to the two possi- 
bilities described in the statement of Theorem l. 


Empirically Useful ASEU Maximization Models 

In order to carry the analysis of ASEU maximization 
models further, consider the way in wh 
be used. To predict choices among simple bets (the use 
with which this paper is concerned), you must know the 
utility and the SPs of each possible outcome of each bet. 
You can then substitute these values into equation (4), 
perform some arithmetic, and predict that the course of 
action with the highest SEU will be chosen. (Such pre- 
dictions will be wrong on occasion. This fact can be in- 
cluded in the model by making it stochastic, or it can be 
excluded from the model and left to a general theory of 
errors, as is done in most psychophysical models. Your 
preference between these two methods of dealing with 
errors, and even the fact that these predictions are some- 
times wrong, are irrelevant to the arguments presented in 
this paper, which concern only the logical apparatus by 
means of which such predictions are made.) 

In order to use & SEU model, then, you need to know 
utilities and SPs independently of the decision you are 
trying to predict. How can you find them? That is a very 
difficult, much-debated question, which will be ignored in 
this paper. For the remainder of this paper, it is enough 
to assume that some psychophysical method is available 
which gives accurate utility and SP measures for any object 

have available such 


or event of interest, and that you 
information about anyone whose decisions you are predict- 


ing. 
Let us suppose that you V 


will accept or rejec 
E happens, and cost hi 


ish to predict whether & person 
+ a bet which will win him $A if event 
m $B if event E doesn't happen. How 
do you look up the SP of event E in order to make your 
prediction? Event E is in one sense unique; it has never 
happened before and never will again. You can find out 
ite SP in advance only if it has some identifying property 
which is identical with the identifying property of some 
previous event whose subjective probability for the person 
whose decision you av predicting is known. 

Identifying properties provide an orderly way to con- 
ceptualize the possibility that more than one SP may cor- 
respond to & given op. However, & consideration of several 
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r 
Vable set of events » then wheneve 


» Subjective probability 
must equal objective probability. THEA. Teads ne hehe 


Tue for a number of cases of wide 


ASEU models must be abandonea, presumably in fayor of 
NASEU models, 


The Probability Preference Data 
A rather Substantial body of da 
tion between SP and oP, Th a come from a series 
of experiments (Edwards , 19533 Edw: 


ards , 195la; Edwards, 


9 prepared, Each 
OPs of Winning (or losing, 

Ing from 1/8 through 8/8. In 

various experiments & total of eight different lists were 
used. The eight bets in each list all haq the same objec- 

sason to prefer 

one bet in a given list to any of the Others in the same 
Paired with One another 
` At various 

times, a total of Well over At var 

quired to choose between the 
Some experiments » the Ss sat in 
Slides of the pairs of bets, and as if they 

were gambling." Other Ss, run individually, first le 
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imaginary choices, then gambled for worthless chips, and 


finally gambled for real money. 
The main results of these experiments, taken as a 


Whole, were 
1. Although there wer! 
ences in choices, certain 


e substantial individual differ- 
patterns of choices showed up 


in &ll experimental groups and in just about all indi- 
viduals. The two most outstanding of these patterns were 
that Ss usually preferred the bet with the 4/8 probability 
of winning from any positive EV list to the other bets 

on that list, and usually preferred the bet with the 

lower probability and higher amount of loss in any pair 


of negative EV bets. 
2. The preferences cited above, and indeed the com- 
plete pattern of choices, was relatively independent of 


EV level of the list of bets involved, so long as the 
This means that the prefer- 


zero point was not crossed. 
ences observed should be attributed to the OPs involved, 
which were constant, rather than to the amounts of money, 
which varied from list to list. This fact suggests that 
for these students and these amounts of money, the 

with amounts of money, 


utility c s relatively linear 
while NE is not linear with OP. Tt also sug- 
gests that the value of OP from which SP deviates most 
widely is O.5, for positive EV bets. Using the assump- 
tion that SP = OP and the further assumption that the 

Size of the just noticed difference for utility is half 
& cent, it is impossible to construct a utility function 
to account for the preferences observed. If the size of 


es ble difference for utility is assumed to 
e just notice& such & curve can be constructed, but 


be s f course : 

s ede i Rue 12 inflection points between $O and 4$6. 

The same sorts of statements can be made if the data are 

analyzed S by Ss (For & fuller explanation of the mean- 

ing 4 f this statement and of the construction methods 
195h&. 

used, see Edwards, f choices changed radically 


lete pattern O 
Pete to negative EV bets, even though exactly 


used. The main change was & strong 
Kee EV bets in which the probability 
of losing was IW and the amount of possible loss high. 
When this preference was removed from the dat& by crude 
statistical means, the residual preference pattern was 
pretty much the mirror image of the preference pattern 
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t 
Which might arise in & gambling experimen 
because S win 

about the n 


smaller amounts of MER 
icantly c anging their patterns of choic 
8 finding has been confirmed in a study devoted es- 
pecially to 44 (Edwards 195ha) 
s and 2 Cause me 
SP = 


t 
to reject the hypothesis tha 
They Therefore also me, in conjunction à 
d a Ove, to reject all EU qm 
S. The implications of findings 3, 4, and 5 ve 

" NASEU model are 4, 3 they will be discussed bel 


» if any, of a 
Wation (5). It defines 
S ASEU models, m. Wh&t does the opera- 
ion called for by the Si in equation (4) 
SEU mode1? For an ASEU m del, that sigma 
Simply derives from the inition of a probability 
e additi i t 
measure, For a NAS; model, Ti has Vrae = 
matical Justification, variety of Psycho]. 
ments about its plausibility are ava; le. This paper 
Will not press them; Svidence to be Presente later argues 
against this form of NASEU model, and I do not advocate or 
defend such a model. he Purpose Of this Section ig to 
introduce some ideas which are a t of the Model which 
ime being, therefore X 
to say that no law of m i 


emat bids l 

m ics forbids the 
ty and non ditive gp values, the 
he Products , or test of t 
Sis that people make an te 


he hypothe- 
Way that they in 
fact maximize the resulting sums. y 
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Ratio Scale Utilities 
However, Theorem 1 indicates that & NASEU model must 


use & ratio scale measure of utility. A ratio scale of 
utility implies that utility has & true zero point. Where 
is it? Only one answer is plausible: Where you now are. 
Zero utility is your current position, and you can never 
leave it. 
This is not exactly & new idea. The first implicit use 
of it was by Mosteller &nd Nogee (1951), who used the EU 
model to make a utility scale for money (and, incidentelly, 
proposed non-additive SPs as an alternative explanation 
for their results). Mosteller and Nogee used a gambling 
situation in which real money changed hands during the 
course of the experiment. But in determining the utility 
of a given amount of money, they simply found an indiffer- 


ence point which involved that amount of money without 
account S's financial status 


moving the origin to take into 
at the time they measured it. Such a procedure can yield 
a classical utility function only if the form of that 
function is invariant up to 4 linear transformation under 
movements of the origin (through which all utility func- 
tions conventionally pass) along the function. Only a 
very limited class of functions, of which the most familiar 
member is the straight line, has this property, and the 
utility curves Mosteller ana Nogee obtained don't look 
like any member of that class. Consequently those curves 
in fact defined zero utility as the present monetary 
status of each S, and so were of the type that this sec- 
tion advocates, even though Mosteller and Nogee didn't say 
50. 

The first self-conscious use of an idea like that pro- 
posed here was made by Markowitz (1952). abge 

of the U y scale be en as S's 

ke o en and that the form of the 


customary financial position, 
function changes when that customary financial position 


changes. He used this idea to remedy a deficiency in 
e (1948; 1952) previous account of 


Fr avage's 
ees = eg buying. The only difference between 
Markowitz's position and this one i8 that Markowitz defines 
Zero utility as the customary position, while this paper 

e current position. 
:s conception of utility is novel and quite 
sae eem itional one, it is neither in- 
B contraðictory nor absurd. Nor does it imply that 
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People cannot Change their habi 
financial status. 


they possess, This UN EE 
in principle testable, though testing it experimenta 
is hard, 


€ addition theorem for SPs, even though 


Significance both mathematically 
The followi 


ves a theorem which in principle (and 
perhaps in fact) could be i 


ms (1) through (4). This 
means that an empirical verificati 
evidence for thi 


money changes hands. S dekorita Bee 
point (x, p); that is, b 

represented by & plane peg p  ompletely 
x= 0. Now, consider the bets (x > 


enger SC Sé 
ea 
ense, Suggests + deal of 


12 


CO 
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ion), anà if the direction of 
preference reverses for any negative values of e and f 
(another reasonable idealization), it follows that a set 
of indifference curves can be drawn in the x p plane. 
An indifference curve in this application is simply & 
function which defines a set of bets among all of which 
S is indifferent (see Coombs and Beardslee, 195h, and 
Edwards, 1954c, for details). A set of indifference 
curves is called an indifference map; Figure 10-1 shows 


such a map. 


(a reasonable idealizat 


WE 


Dollars 


FIGURE 10-1. Hypothetical Indifference Map Among Simple Bets. 
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Since each curve identifies bets among 
all of which S is indifferent, all bets on a given curve 


must have the same SEU. In short, in utility and SP 
Measures the equation of each indifference curve c must 
be 


But indifferenc 
subjective, units. 


equation (9), which is of 
gular hyperbola. In fact, 
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A, B, C, and D, as in Figure 10-1. From now on, SA 
means the slope of the indifference curve which passes 
through bet A. 
Theorem 2: If any one of the EV, EU, SEV, or SEU 
models, &dditive or otherwise, is correct, then SA Sc = 
Sp Sp, and if this condition on the slopes of an indif- 
ference map exists, then it follows that for these data 
at least the NASEU model is correct. The proof of this 


theorem will not be given here. 


The Weighted SEU Model 


The idea of SP is fundamentally & psychophysical idea. 


People, exposed to & sound stimulus which vibrates at & 
certain number of cycles per second, make judgments of 
pitch which are a function of that number. Similarly, 
people, exposed to an event with a given OP, make judg- 
ments of likeliness which are a function of OP, and 
probably of other characteristics of the display as well. 


But what kind of & function? Most of the argument of 
ted to the proposition that for 


this paper has been devo 

any of the class of models considered here except NASEU 
models that function may only be linear. The probability 
preference data indicate that it cannot be linear. Or 
do they? 

An old familiar finding in psychophysics is that the 
form of any subjective scale depends on the methods used 
to determine it. The same proposition may be true for 
SP and utility scaling. The probability preference 
experiments were & rather indirect method of inferring 
utility and SP properties from gambling choices. What 
results would more direct methods yield? 

tility of money, it seems unlikely 
that direct methods of psychophysical scaling will be 
question which would have to be 
"What SN of money is twice as 
tw to you? + seems unlikely that 
ears give an answer systematically different from 
"Pour dollars." The difficulty is that money has an in- 
vineibly numerical character, and most people would proba- 
bly respond to mathematical properties of these numbers 
rather than to any subjective values they might have. 
In any case; the evidence is continuing to accumulate 
of money utility is linearly 


that for small amounts 
related to dollar yalue--in which case psychophysical 
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determination of utility Scales is not very likely to be 
interesting, Of course Psychophysical methods could be 
used to seale the utility of other objects which have a 
less numerical character, but it isn't Obvious how such 
experiments would shed much light on general properties 
of utility functions, because in such experiments the 


e is difficult to define. 


In the case of SP, on the other hand, some psychophys- 
ical data exist. E. H. 


(two 20-sidea dice) chose one 10 


» Just as in Shuford's 
experiments » and that the choice 
are somewhat distorted. A 
this finding is that at almost th 
say that SP is and is not equal 


tation is correct, Something pec 
on. 


9 same moment the Ss 
If this interpre- 
Clearly going 


l 
matics by supposing that they attach + 
Wi which expresses the rel&tive 
bility of that probability. A mode 
& minor variation on equation (4): 
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10 = 
(10) WSEU : (ew) u» 


Mathematically, the WSEU model is exactly the same as 
the NASEU model (which is why this paper has devoted so 
much attention to the properties of the NASEU model) 
except that the WSEU model comes by its addition opera- 
an arbitrarily. Since there is 


tion naturally rather thi 
no mathematical difference between the NASEU and WSEU 
them is presumably one of 


models, the difference between 
esthetics; I find the WSEU model esthetically more 


appealing. 


Variance Preferences 


Allais (19558; 1953b) and others have suggested that 


the variance of a bet may be more important than its EV, 
Objective or subjective, in determining preferences. 
This suggestion is intuitively appealing, put extremely 
hard to translate into an experiment. I published one 
attempt to establish the existence of variance prefer- 
ences (Edwards, 195hà) in which the conclusion was that 
they exist, but are small in size relative to probability 
preferences. Coombs (personal communication) has some 
data which suggest that they exist and are much more 
important than probability preferences. In any case, 
the existence of variance preferences is inconsistent 
with any model like equations (4) or (10) „x indeed 
with any maximizat f the kind discussed in 


ion model 0: 
t . 
G, condition On slopes of indiffer- 
ence curves which was discussed above would in principle 
provide a crucial experimental test for the reality of 
variance preferences, in the sense that if it is satis- 
fied then variance preferences cannot have played & role 
in the choices from which the indifference curves were 
constructed. But it is extremely difficult to do the 
necessary experiment. A more likely &venue toward 
successful testing of var Jance preferences emerges from 
the fact that the utility of money can safely be assumed 
linear with money for small amounts of money. (This con- 
clusion is based on & wide variety of experimental data 
(ef. Edwards, 195483 Edwards, 1955); several experi- 
menters share it; put there are still plenty of dis- 
senters.) It 15 possible to construct square matrices 
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experiment reveals that 
and are important, what direc- 
endeavor take then? I don't know. 


variance Preferences exist 


ty Property » then 
y 


on a ratio scale. "USt be measured 

If subjective Probabilit4 
property, and if the decision ä 
is in principle applicable to conceivable nol = Son 
events, then whenever objectiy, 9 
subjective probability mist equal Objectiy, 
given certain reasonable assumptions. Th S See SE 
the concept of Subjective Probabilitieg whic) 


Utility and Subjective Probability 127 


one, and decision models based on that concept, cannot be 
very helpful in explaining decision making; a large amount 
of data indicates that subjective probability cannot 
always equal objective probability. 

One alternative is to use & decision model in which 
subjective probabilities need not add up to any particu- 
lar constant. Such a model is possible; its properties 
are explored. One property is that in such a model, 
utility must have a true zero point. The only reasonable 
zero point for utility is where you now are; this notion 


is not new to decision theory. 
Any form of the expected utility maximization hypothe- 
sis (with additive or non-additive subjective probabili- 


ties) implies a strong relationship among the slopes of 
indifference curves for simple pets. This relationship 
offers an experimental test of the applicability of 
models of this class. The experiment is very difficult 
to perform, but the result which would confirm the 
applicability of models of this class is unlikely. 
Further theories about decision making in risky situa- 
tions should probably concentrate on variance preferences. 


1] Sidney Siegel 


Pennsylvania State University 


Individual Decision-Making 


Under Risk: A Model 
for Finite Choice Structures 


model of individual choice be- 
havior in risky outcome situations in which the alterna- 
tives involve a discrete set of entities (objects and/or 


actions). The model is finitistic in the sense that the 
set of alternatives and their probability combinations, 
dual is to choose, is finite. A 


between which an indivi 
feature of the model is the fact that it is possible to 
derive a person's higher ordered metric scale of utility 
(subjective yalue) from & subset of his choices, and 
thus to predict the remaining choices. That is, the in- 
formation in the higher ordered metric scale is used to 
predict those choices of the person which were not used 
in deriving the scale. 
Coombs (1950; 1952) suggested the use of the term 
ales which have not only an 


"ordered metric" for those sc 
ordering on the set of entities but also at least a 


partial ordering on the distances between the entities. 
Coombs also presents & method, which he calls the "unfold- 
ing technique," for obtaining &n ordered metric scale. 
Siegel (1956) has suggested that those scales which have 
& complete ordering on the distances be called "higher 
ordered metric." That is, & higher ordered metric scale 
not only orders the entities (an ordinal scale) and orders 
all the simple di tween the entities (as does 


This paper presents & 


stances be 


hile the author was y? 

This paper was Pr a vi on leave at 
the Center for A in the Behavioral Sciences. 

I am grateful J. Arrow, Selwyn W. Becker, Robert 
M. Solow, and John W. Tukey for discussions which have bene- 
fited this paper. 129 
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An ordered metric Scale) bu 
Combinations of the distanc 
between pairs of entities), 


t also orders all contiguous 2 
es (i.e., orders all distance 


8 
esirable or necessary (Davidson, SC á 
» 1957; Edwards, 1954c; Friedman & Savage, 1948; 
Mosteller & Nogee, 1951; Savage, 1954; Yon Neumann & 


sions of Coombs and Komorita (1955), Fagot (1957), and 
7)- it has been pointed out 


or achieving measurement 
terval scale, Von Neumann 
P. 18) require that probability 
unity. Although such a 
» 8t present it cannot 


SY Occur when the subject 
ties whose utilities stand in some 
specified relation to a For an example, see 


uch as p 
of 


eat is concerned, 
s Measurement that can be 
achieved at Present is highe: te 5 
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2. A relation Pg between pairs of elements in KxK; 
i.e., Pg is a quaternary relation on the set K and thus 
a binary relation on the Cartesian product KxK and is 
interpreted so that wz Pq xy holds when the difference in 
utility between w and x is greater than the difference 
between y and z. 

The second primitive notion, the relation Pg, requires 
Some explanation. It is to be understood in terms of the 
Operational definition and other considerations which 
follow. 

The essential device on which the model depends is & 
one-person game in which the person chooses between two 
alternatives, each of which is & probability combination 
of two outcomes. The format for each offer is 


Alternative 1 Alternative 2 
Alternative > — —————— 


you get w you get x 


If event E occurs 
you get Zz you get y 


If event E occurs 


o choose between Alternative 1 
hance event, E, with sub- 
s the outcome of that 


The person is required t 
and Alternative 2, and then & © 
jective probability, P =}, determine 
al 

lesa a hypothesis is that an individual makes 
choices among uncertain outcomes as if he were trying to 
maximize expected utility. If the person chooses Alterna- 
tive 1, then the probability combinations may be ordered 
thus: 


1) be 23 P) > (x, ¥3 p) 
and 


(2) p.U(w) + (1 - P 


ctive probab 
iscussed later. 


yu(z) »».060) + H: p)-U(y) > 


ility, and U is a utility 


where p is subje ie p= 4, then 


function to be d 


(3) u(w) + U(z) > u(x) + (y) 


and, finally, 
(4) uGr) - UC) >uly) - Uz) > 
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which is to Say that the difference in utility between = 
and x (i.e., WX) is Breater than the difference in HUS 
between y and z (i.e., yz). That is, when a person's 
choice indicates that (w, z; 3) > (x, ýs 3 » then we may 


It should be hoted that the distances are directed 

distances. For Example, from (3) we could obtain 

U(z) - U(y) > U(x) - U(w). But Since w>x>y> Z, We 

multiply through by -1 in order to Bet positive utility 

intervals, as Shown in (4), 
D ( ) notion, P PET quaternary 

t wz Pg xy holds if, and 

2; the person chooses Alternative 1 

etween Alterna- 


of Pq, it is necessary ee 
ion U used in the inequa. 
ties above, 


The utility funetion Uisa real 
ne 


-valued function de- 
d over K such that for every w, 


Z, X, and Y in K 
a) xx Pa YY if, ana only if, U(x) > U(y), ana 

"= Pa Xy if, and only if, Uv) + U(z) > u(x) + u(y) 
and U(w) . U(x) > U(y) - U(z). 


equalities, If th 
& person's Choices 
(Kuhn, 1956), thon 
utility function can be Constructeg 
on Pg are satisfied 1 Thes k 
Axiom 1. The relation p S a str he 
diagonal elements of the Car a oiee tering ve 


That is, 


ticular the larger) values of y 
The argument on this point is p 


n ntiti * 
resenteq in Scot and Sien 
(1957). 
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Pq satisfies the following conditions: 

&) exactly one of the following holds: xx Pa yy» 
Ny Ea xx 

b) Ap xx Py yy and yy Pa ZZ, then xx P, ZZ. 


Axiom 2. For all w, x, y, and z from K, 


a) if xx Pa yy» then xx P, xy and xy P, yy; 
b) bf ww B, xx and xe PQ yy, then ww By Kp. vr fS S 
wx P xy, and wx P, HI 


c) if ww Py xX, xx Py yy» and yy Py zz, then wx P, yz, 


and wy Pa XZ. 
If a couple consists of any two elements in KxK, then 


Definition l. A non-enveloped couple is & pair of 
product KxK which is ordered 


elements in the Cartesian 
either by the strict ordering of Axiom 1 or by the appli- 
cation of Axiom 2 to the strict ordering of Axiom 1. 
Definition 2. All pairs of elements in the Cartesian 
product KxK which are not non-enveloped couples are 
enveloped couples. 
Definitions 1 end 2 imply that the Cartesian product 
KxK may be partitioned into two subsets, one consisting 
py Axioms 1 and 2 (the non- 


of those elements ordered : 
enveloped couples) 8nd the other consisting of the 
For example, if the set K consisted 


enveloped couples. 
of four entities, VW» 
elements of the diagonA. 


x, y» and z, and if by Axiom 1 the 
1 of KxK were ordered in this way: 


P P zz 
WW BE TUN Tg 
then x and y are enveloped (i.e., completely meatal’ B 
"sandwiched") between W and z and the following consti- 
tute all the enveloped couples in the Cartesian product 
KxK: 
xy is nes 
enveloped couple 
x [e nested in wz and thus (xx, wz) constitute an 


enveloped couple 
is nested in WZ and thus (yy, wz) constitute an 


enveloped couple 


tea in wz and thus (xy, wz) constitute an 
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XX is nested in Wy and thus (xx, wy) constitute an 
enveloped couple 


YY is nested in xz and thus (yy, xz) constitute an 


b) ifwz ^ Xy and yy P zu are both enveloped couples, 


then wv Pa xu 


Which is to say that 
a) either wx 


b) 


2YyZ or yz > Wx and, 
if WX >Yz ana yz »uv then wx >F e 


Axiom h. re wz Pq xy and xy p 


q YZ are both enveloped 
q YY and xv p 


q Zy. 
Which is to Say that if wx > yz and Xy > ZV, then 
WE + 37 > YZ 4 oy, That is, if 


distance À] is greater 
than distance do, and if distance i 


(à + 35) is greater than the 
distancés (dp + dj) 


Definition 3. Axi 
ordered metric model. 


The Generalized Lattice 
At this point a i 


oms 1 through 4 define the higher 


be presented (see 
how KxK is af- 
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NE S 
(2,153) 
` 

% 


FIGURE RES 


The lattice com 


135 


[3 


nn 


Rn Æ 
HEH 
(n-1,1352) 


AE 


(5,153) 


The Generalized Lattice. 


tains the total number of different 


it consists of the diagonal 


ments in Kj 1.6.) 
ee SR plus one-half of the remaining elements 
low or above the diagonal). 


(all 
It is interesting 


ordering of KX. 
and which leads 


That is, 


to an ordinal scale) generates the 
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Particular Structure of the lattice. 
& heuristic function in empirical work 


derive his higher 
ric scale of utility. That is, the lattice 
May be used 


the non-enveloped couples, 
The central features of 


SO that the higher probability co 
the lower, e.g., 


AN P, BN because (A, N; 4) > (B, N; à 
A(N - 1) E BN because (E. GR 1; 3) > (B, N; à) 


AA Pi BC because (A, A; i)» (B, c 
Etc. 


33) 


(A, C; $) 2 (B, B; à 


(A, N; $)? (5, N-a 


deri 9r ordered 
ng can be r 
satisfying Axioms 5 and 4, ealizea by 
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From the lattice, then, one may partition the set KxK 


a two subsets, say O and OM, with the elements of 
is set O consisting of the non-enveloped couples and the 
ements of subset OM consisting of the enveloped couples. 


The discussion which follows considers the size of the 
Subsets O and OM, and also considers other important 


parameters of the model. 

The Numerical Constants of the Model 

5 The numerical constants of the model &re enumer&ted 
elow. Accompanying each is a description, 
tion when necessary. 


1) N= the number of entities in set K. 


2) Np -(re T ud , where Np is the total number 
q q 


of different pairs of elements in KxK. The lattice 
(Figure 11-1) shows that the number of probability 


combinations is the sum of the arithmetic series, 
tal number of instances where 


la B; ey, Ne ne Vo 
Pg holds is the number of ways of selecting two 
probability combinations from the lattice. 


[Ar total numb 
3) Noy = ( po)» where Noy is the total number of P, 


relations for the subset OM, 1-e-, the total number 
of enveloped couples. The subset OM consists of the 
enveloped couples. Therefore every Pq relation in 


the subset (where w >X >Y 7 z) is either of the 
De, involves four different 


form wz Pq Xy (i 

entities) or is of the form wz Pg xx (i.e., involves 
three different entities). The latter is the case 
when & diagonal element of KxK is involved. There- 


fore 


Nom -(*)+(5) 
059-079 


and therefore 


and a deriva- 
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Kap 18) when j=) , 


N(N + 1)/2 N+1 
4) Ny = (At $ un Ei hs ) » were ty 18 the total 


number of tá relations in subset 0. 


5) minn = F _ ið 


Struct the Appropriate lattice. 
6) max N = 


the &ppropriate lattice. 


(N+1 
7) max Noy = ( y ) > where max A. 


In practice, this maximum 
can always be reduced b 


(2) 
2 
8) Na Ai 2 ) » Where Na is the tot 


The Predictive Efficiency Of the Mode] 
Predictive efficiency Is in many Tespects 
the notion of "predictive Power" which Fagot quen > 
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applied to an ordered metric model of decision-making 
under conditions of certainty. It is also related to the 
concept of "systematic power" formulated by Hempel and 
Oppenheim (19h8) and Kemeny and Oppenheim (1955). 


The predictive efficiency of a decision-making model 
is related to the size of the subset of choices which 
at a complete charac- 


must be observed in order to arrive 
terization of the entire set of choices. The predictive 


efficiency thus indicates the efficiency of & model in 
explaining or predicting the observations. 
It was seen in the previous section that the total 
number of different instances of Pq in KxK is 
Ns Ka ims d DÉI . The predictive efficiency of the 
q 


model may be determined by 
tions, Npy? into two subsets, 


A contains those instances whose kn 
to predict the remaining relations in Npg 
axioms of the model, and subset B contains those which 
may be thus predicted. The predictive efficiency is then 
the ratio of the predicted relations, B, to the total 
nunber of relations, Npg =A +B. That is, the predictive 


partitioning the set of rela- 
A and B, such that subset 


owledge is sufficient 
by means of the 


efficiency, Poff’ is 
"Ur RECEN n 
Fee "E er 


q 
The notion of predictive efficienc 
following additional numerical constant o 


N(N i del = [@ M Ð Ð] , where min B 


of relations that can be pre- 


q 
y is the basis for the 


f the model. 


9) min gal 


imum number 
xen em knowledge of the relations in subset A 
when subset A is & maximum &nd is computed from 
constents 6 and T given previously. 
min B=Np 7 max Å + 
q 
An expression for the minimum pređictive efficiency 
stated: 


may now be 
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min B N, - max A ( 
i Es =ð sa j 
Pore N N 
P P 
q q 


3 1.e., the number 
of possible predictions is &re&ter than the number of 
relations which must b 


predict the relations in B. 


For example » if the number of entities being scaled 
isN=5, then, if the worst possibl 


higher ordered metric scale results, then 


80 
= — = .76, rvation 
min Peer i05 T6. This means that each observat 


leads to &bout three predictions. 
It is interesting to note that 

2 

Limit min Pore == 
N >œ 


A Hypothetical Experiment 

To draw together and to illustrate yhat has been said 
about higher Ordered metric Scaling, a hypothetical ex- 
periment is presented here, 


Suppose five entities (N — 
preference, These entities mg 


subject's choices that 
Pa BB Pq CC Pq DD Pg FE. 


For this case, the lattice shom in Figure 11-2 contains 


an exhaustive set of Pg inst 
From Figure 11- 
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N„ = Zeie i fe) zii 


P 
q 
PF 
(4,352) 
(A,0;3) (8,353) 
(A,D33) "a 
Dech) Pa Pa 
(erh) gH Ss 
(0,833) ras 
TO 
KI 
FIGURE 11-2. Lattice for N = 5 and A >B >C>YD>E. 


The probability combinations Gem? ene 
two subsets; O an . q 
a rere any two probability combinations 
Se can be connecbed on: the Jattice by a consistently 
rep ei descending) line. Otherwise the 


rising (or consistently à 
pair A ability combinations is in OM. The prefer 
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It is the relations in subset OM which contain ordered 
metric information. From Figure 11-2 it may be seen that 


Now = e = 15. It is not necessary to observe the 


need be Observed, 


depends Partly on the order of presentation of the 
choices and partly 


ences (direction of his choices), 


Suppose our subject is an Optimum case; i.e., suppose 
We can construct the lattice on the basig of his first 


four choices (min No=N-1= 4), as follows: 


(1) (a, as 3) > (B, B; 


1 
2 
(2) ` (B, B; 4) > (06,0; à 
i 
2 


(5) (A, E; Sie, De 3) ana therefore ID < pg 


B; 3) ana therefore AF > BD 


AE > BE > CE > DE > AD > AC samy a 


cale og ut 


five entities may be represented ag follows; 
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A B F D E 
The predictive efficiency for this instance is 


P ee zd x 
ere 7 1 pau e 
2 


According to the interpretation of Pepp in terms of the 


number of predictions per observation, in this instance 
we have 14 predictions possible for each observation. 
That is, having observed the above seven decisions 
(choices) of the subject, we may predict the other 98 
decisions he would make if confronted with &ll other 


possible choices among probability combinations. 
A maximin procedure (Siegel & Becker, 1959) has been 
developed which guarantees & maximin Pepe on the average. 
This procedure is based on a method in which the alterna- 
tives are presented to & subject in such an order that 
each choice he makes divides the remaining possible higher 
Ordered metric scales ("possible" in the sense that all 


are consistent with the choices he has made so far) in 


half, thereby enabling the experimenter to isolate the 
inimum information. Where N = 5, the 
e enables the derivation of an individual's 
the basis of 14 observa- 


is thus 


maximin procedur 
higher ordered metric scale on 
tions at most. The expected Perf 


maximin Poff 
Examples of higher orðered metric scaling and its uses 
in decision-making research may be found in Becker and 
Siegel (1950), Hurst and Siegel (1956), Siegel (1956), 
and Siegel (1957)- 


1 2 Clyde H. Coombs 
andR.C.Kao 
University of Michigan 


On a Connection 
Between Factor Analysis and 
Multidimensionsional Unfolding 


The Unfolding Technique for preferential choice be- 
havior (Coombs, 1950; Coombs, 1952) is derived from & 
model of the following form. An individual, in making 

t of alternatives, may be 


preferential choices among & Se 
represented by & point in & Euclidean r-dimensional 


space, ET, and, correspondingly, each alternative may be 
represented by & point in the same space. The individual 
prefers one alternative to another if and only if the 


point corresponding to the preferred alternative is 
onding to the individual. To 


nearer to the point corresp 
each point corresponds 8n r-tuple which is & set of meas- 
ing the space. These dimen- 


ures on the dimensions spann: 
sychological variables 


sions may be interpreted as D i 
generating the preferences of the individuals, where the 


point corresponding to an individual is an ideal point 
representing & hypothetical alternative preferred to all 
istency of preferences , to be dis- 


possible ones. Incons 
intransitivity, may be generated by 


tinguished from 
random variability in the locus of points (Coombs, 1958). 
According to the model, an individual's dominant 
preferences may pe represented by & rank order scale of 
the alternatives given by the transitive set of stochas- 
-wise preferences. Such a scale 


tically determined pair 
may be regarded as folding the 


is called an I scale and 
it up at the ideal point and c ollapsing 


space by picking 
ith the measure of the stimulus points 


it into a line W 
on this line corresponding to their respective distances 


from the ideal point. Distinct ideal points generate 


distinct I scales in this < od With ordinal preference 
155 


Psychological Scaling 


of determining the confi 
Stimuli ana individuals ( 
by W. L. Hays (1954). 

The following problem naturally arises. 


Suppose one 
Intercorrelatea the individuals! 


I scales and factor 


natives are all 
To avoid sampling 
9 dense and that the 


very close to him. Clearly, their Preference orderings 
Will be almost identical and will 


Correlate close to 
plus one. Individual A's I scale 
I scales of oth 
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the correlation matrix will be & semicirele with the indi- 
viduals corresponding to a fen of vectors such that the 
vector of the median individual projects vertically upward 
and orthogonal to the vectors of the two extreme indi- 
viduals which form an angle of 180 degrees. The order of 
the termini of the vectors on the arc would correspond 
exactly to the order of the corresponding points on the 


original line. 
If one factor analyzes suc 
method of principal components , 


would be the original line which 
tial choices and the second dimension would be the vector 


of the median individual on the line. This second dimen- 
sion, on which the projections of individual points are 


in reverse order to how closely each is to all the others 
on the line, is in another context, called a social 
utility (Coombs, 1954). The higher the projection the 


more nearly that point pest represents all the other 
points in the sense of peing nearest to them all. 

If we consider the case of a two-dimensional latent 
attribute space gener erential choices we 


ating the pref 
now have two dense and superimposed pivariate distribu- 
tions, one for individual 


s and one for stimuli. If one 
considers the correlation of the I scale of an individual 
on the rim of this space with other individuals, it seems 
reasonable that the correlations will progressively 
decrease through zero to minus one as one approaches an 
individual across t 


he space from him and that the median 
individuel on the plane wi 


11 correlate non-negatively 
with everyone. The configuration generated by the set 
of unit vectors is now 9 hem: 


isphere in three dimensions 
with the median individuel represented by a unit vector 
perpendicular to the plane in wh: 


ich the vectors of all 
individuals on the rim of the 


plane lie. 
If such were the case, a factor analysis would yield 
three dimensions, 


th the third principal component. 

jal utility and the first two 
again corresponding EE 
nerd os representing the original space which gene- 
rated the preferential choices. 
as intuitively obvious, we may generalize 
n to a space of r dimensions in which we 
nfiguration corresponding to the cor- 


h a configuration by the 
the first dimension 
generated the preferen- 
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the first r dimensions would corres- 
pond to the original space. 


through simulation on the computer. Any attempt to 


proposition would 

necessarily lead to some distortion, the matching of the 
two being particularly Sensitive to the density and the 
distribution of stimulus points and t 


Computer Simulationl 


In order to test the Plausibility of the proposition 
dise ed above under rather general and varying condi- 
tions, several problems were constructed and run, of 


major role. These will be presented first. 
Three sets of 15 random numbers Were taken to repre- 


» and another 
three sets of 30 random numbers, those of 30 stimuli in 


the same space (RAND Corp., 1955). We restricted all 


basic cube. A third 
set of points was taken to represent & s 


econd set of 
Stimulus points, these being the 64 lattice points of & 
"grid" contained in the basic cube 


EE 
lThe authors are indebted to L. A. Re 

‚© Raphael, Caroli 1 
and F. M. Goode for Progranming assistance, and or S Tefft 
Staugas for providing other computer services are M 
stages of this study. var 
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The Euclidean distances of each individual from all 
the stimuli (random or lattice) were computed and these 
measures provided an I scale for the individual which is 
a ratio rather than a simply-ordered scale. The product 
moment correlations: 


n n n 
ms uf, - PES galla 


2 2 2 
Ber = (25,23) n2, Py - (23-13) } 


were then computed between each pair of individuals' I 
scales o = Ion: eat B= (Bry ++eesPn)s n = 30 or 6h. 
The two 15x15 correlation matrices (Mr for the matrix of 
correlations of individuals over random stimuli and Me 
for correlations of individuals over lattice stimuli) 


with unity in the diagonal were then factor analyzed by 
means of the principal components method. From the char- 
acteristic values and vectors so obtained, factor load- 


ings were computed using 


HA ` 
i (e Ët ee 3 


mi dees 


^m 


(1) Sa = 
{ 


where np; = PP d. is the gun characteristic vector 
and M ine corresponding characteristic value. The char- 
acteristic values of M, Were 6.39715, 4.01020, 2.35246, 
1.99655, 0.59150, 0.10905, 0.06822, 0.03552, 0.02680, 
0.01196, 0.00999, I 0.00075, and ery those 
for Me were 5.21565 3.94788, 5.41530, 2.4948, 0.12577, 
0.07310, 0.05066, 0.02790, 0.02052, ee 0.00515, 
0.00385, 0.00161, 0.0010h, and 0.00014. It can be seen 
the magnitude of the characteristic 


arp drop in 
Kier? after the fourth one. We therefore took 


ums Lëtz , San and aj) as factor 
the iret fo A me 15 erede CN in four 
ee Two crucial questions arise. First, how 
inal coordinates of the individual points 

ensions related to their factor loadings in 
nd, what is the significance of 
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the configuration of the individual points in the original 
Benotypic space. Hence, the first question can be settled 
lf we show that the original set of coordinates for 15 
individuals can be "imbedded" into the first four factors 
Just obtained. To this end, we use Tucker's (1951b) 
method of congruence, For the random stimuli, the coeffi- 


€ remaining dimension. However, in 
ngruence is relatively high and a 
reasonably good fit of the original configuration of 
individual points into a three-dimensional subspace of 
the factor Space is possible. Since the congruence is 
not perfect on all dimensions, We need to distinguish 

two three-dimensional subspaces, E2 ang EÍ, of the factor 
Space for both sets of Stimulus points. 

The second anà more interesti 
in the following manner. The pr 
of the individuals' vectors on the fourth 
orthogonal to the three-dimensio 


(Eg or E$) were Obtained for the random Stimuli and 
lattice stimuli by 


ion on the individual 
points in the original genotypic Space, called (in another 
context) a social utility. Thi 


8 is defined 88 a scale on 
which an individual has a Scale value &iven by 
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(4 oe i 
ey ee Phe us d 


where dj, is the Euclidean distance petween the ith and 
the kth individual in a set of m people. The smaller the 
average distance s; is, the closer the point is to the 
median of the population. Hence, the rank orders of 
people on this scale should correlate highly in reverse 
order with those of their projections on the fourth 
dimension according to the theory- When the individual 
points were ranked in increasing distance from the median 
point but in decreasing magnitude of their projections 

on the fourth dimension, the Spearman rank correlation, 
P, between s; and pin Was found to be .895 for the random 
stimuli and .964 for the lattice stimuli; between s, and 
Pip, p was .722 for random and .896 for lattice stimuli. 
A nonparemetric test on these rank correlations indicated 
that we could reject Ho (no association) at a= 0-005 
level (one-tailed) for all cases. We also obtained the 
best possible P between Si and projections on the fourth 
dimension in the following sense: We determined a number 


e(o < O á 1) such that 


2 2 5 S 
GY pua Pa" (ei, Oe 5) 
3 à deviations from 
yielded the minimum sum of square 
s= (sy), i=l, coer los in terms of rank orders. The rank 
correlation between s, and PiAQB was .095 for random and 
.968 for lattice stimuli. It is clear that a better fit 
ocial utility scale and the set of projec- 
1 vectors on the fourth dimension is 
f 64 lattice stimuli. It appears, 
that there is some evidence for answering in 
Mame ce tions which led us to include 4 
second set of stimulus points in the simulation study. 
run to see if the same phenome- 


roblems were 
Tro HE E 1 and 2 dimensions. For this purpose, 


ccur in 

ee "d o of stimulus points was retained by choosing 
e a ension of coordinates for individual points and 

dinates for random stimuli in the gl 


one dimension o 
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case, and for E2 two dimensions of individual and random 
Stimulus coordinates Were selected. The matrices of cor- 
relations Qn and Mp) between individuals over stimuli 
were factor analyzed. The first five characteristic 
values for M were 12.02233, 2.70781, 0.13125, 0.10578, 
and 0.01797; for Mo they were 1.58998, 4.81759, 2.36532, 
0.24135, and 0.07356. For the One-dimensional case, 
there is a sharp drop in the magnitude of the charac- 

i € second one, whereas for the 
two-dimensional case, this occurs after the third one. 
For the one-dimensional case, the coefficient of con- 
gruence for imbedding the original individual points in 
the factor space was -976 and the rank correlations 
between Sj and projections on the second dimension for 
the two congruent subspaces were «971 and .761 (a < 
0005). For the two-dimensional case, the coefficients 


only the original 


A matrix of corre- 
lations was computed using the formula 


5 
e rac 
(6) cos o = Dus 


IS za ] i 


where x=(x] ,x5,xz), az) are any two arbitrary 
TOWS of coordinates. The first five characteristic 
values were 6.90505, 4.75822, 3.55655, o 


-01179, and 
0.00000. It appears clear that for thi 


Discussion 

To recapitulate briefly the main results of the pre- 
ceding section, & joint Space is taken with both indi- 
viduals and stimuli as points in it, 


I scale of 
preferences over the stimuli is then constructed for 
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ius ca give rise to & matrix of correlations which we 
en factor by the method of principal components. In 
each problem, the dimensionality of the factor space is 
noticeably one higher than the original genotypic space. 
But, the configuration of the individual points in the 
original genotypic space can be faithfully reproduced in 
a hyperplane of the factor space. The rank orders of the 
projections of the individual vectors on the extra dimen- 
Sion correlate highly in reverse order with those of 


their distances from the median point of the population 
These results are ob- 


f different dimensions 
are quite arbitrarily chosen. 
tudy has shed much light on the 
analysis and multidimensional 
lyzing date arising from 


and the stimulus points 
While the simulation s 

connection between factor 

unfolding as methods for ana. 

preferential choice behavior, 

practical standpoint, remains to be solved before the 

results from the simulation study can be put to use. 

This is the determine 

in the factor space of preferential choice data. Our 

study begins with & given genotypic space which is, by 

n en er the existence of & 

sugges 
epe eg werd referential choice data 


social utility dimension when P 

are factor analyzed py the method of principal components, 
how we may proceed to determine such 
how do we 


put fails to tell us 
a dimension. This is equivalent to asking, 
know where to rotate after the factors have been extracted? 
For, as we see in all the problems run in the simulation 
a simultaneous rotation is needed to imbed the 
original genotypic space into & hyperplane of the factor 
space. If the genotypic space is unknown, how do we then 
ix of transformation to be used? As 
e answer to this question. But 
: be obtained if we try to fit the 
cone information ned ted under different stimulus 
that the same genotypic space 
d as a nyperplane in either factor space, 
dom stimuli and the other using lattice 
te ay ronio the two sets of stimuli are quite arbi- 
trarily chosen and these play only an intermediate role, 
ble that if a number of such stimulus 


ears reason& 
en taken, the resulting factors extracted should be 
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rel&ted in some Way. In short, we assert that the only 
configuration which can be imbedded into a hyperplane of 
the factor spaces derived from all possible stimulus 


Situations is the set of individual points in the original 
genotypic space. 


1 3 Ledyard R Tucker 
Educational Testing Service 


Intra-Individual 


and Inter-Individual 
Multidimensionality 


Quantitative study of psychological phenomena tra- 
ditionally has analyzed the problems into two aspects: 
generalized laws of response to varying stimulus condi- 


tions and laws for individual differences in responses 
s arises quite naturally 


to given stimuli. This analysi. 
from the duality of characteristics of the individual 
Each individual brings to the 


and of his environment. 
psychological experiment his various capabilities and 


predispositions, his attitudes and personality traits 


and, indeed, his protoplasmic make-up. In contrast, 
ironment is more or less under the 


the experimental env 
control of the experimenter who may devise the sequence 
of particular stimuli and the form of the responses to 
be made by the individual. The experimenter m&y endeavor 
to establish certain sets in the individual by instruc- 
tion or by experimental manipulation. In a classical 
experiment on reaction time, for example, the subject 
may be trained to or set such that he is 
primed to react whenever a stimulus triggers this action. 
An observed response thus depends on (a) the stimulus 
condition, and (p) the individual. Differential empha- 
spects has led to two correspond- 


sis between the 
ing lines of developm 
This paper was prese while the author was affiliated with 
d Educational Testing Service. 


Princeton Un 
This resear 
the Office of N 


ge two & 
ent. 


ch was supported 


aval Research uni 
the National Science Foundation under Grant NSF G-642, and in 
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It is of interest that 
tive studies of psychological phenomena, the investiga- 


An alternative has 
dual separately and to 
the relations among his 
* Thus, psychophysics 


ch later à into 
the correlation coefficients of to EAM 


day. Work on the struc- 
ture of individual differences followed With Binet's 
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It is of interest that both of these approaches have 


progressed in the complexity of their mathematical models 


from treatment of single dimensions to multidimensional 
approaches. Psychophysics has considered one aspect of 
etc., and 


Stimuli at a time such as brightness, pitch, 
has established a number of scaling techniques. Fechner 
proposed the addition of Jj.n.d.'s. Fractionation and 
ratio production methods have been developed from 
Plateau's work and Delbceuf's equisection experiment 
oubling stimuli, to 
Stevens' (cf. 1958c) work of the present day. 

A third device, chosen by Thurstone (19278; 1927b) 
for the establishment of psyc was the 
frequency distributions of percep 
v&lues upon repetition of particular stimuli. This basic 
approach does not dictate, necessarily, that the yari- 
ances, or discriminal dispersions, of the distributions 
are equal for stimuli having different mean affective 
values. Thurstone adopted the normal distribution on the 

iate form. I 


basis and seemingly appropr: 
of canes uld have been quite willing to shift 


am certain that he wo 
to a different distribution on the basis of experimental 
evidence or of extende The possibility of such 
& shift was explicit in several personal conversations. 
Thurstone was continually alert for data that indicated 
1 discriminal dispersions. 
extended to multiple 


the need for unequa. Gergen 
eo: 
Thurstone's scaling perception by Richard- 


any area of 
r deese triadic judgment quite remi- 
niscent of Delboeuf's sense-distances. Having three 
stimuli, A, B; and C, each subject at each session was to 
testa for which pair the sense-distance was greatest 
and for which pair the sense-distance was least. Using 
ibutions and methods analysis 
multivariate peces by Young and Householder 
imented with judgments concerning 
(1938), Geer pape This beginning has been followed 
mur M (1941) study of perceived friendliness 
wi tween nations and Torgerson (1952; 1958, 
Ch. 31) and Messick (19560; 1956c). k 
àual differences, similar exten- 
iews to multidimensionality 


the major supporter of & 
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Several early eritics, in- 
ndike, maintained the view 
r of factors, bonds, and 
abilities. Kelley and Hotelling attempted, ine 
pendently, to obtain the most important dimensions with 
the large field by solution fo 


Components, urstone developed multiple factor analysis 
S theory to investigate the 
le dimensional fields. He 


8 view of a single intellectual 
factor was too limiting wh 


en study of PSychometric scales and 
Study of individuai differe 


a theory which involves 
both Scaling of stimuli and Study of hi 
individual differences., 


lazarsfeld's Latent 
ng Techniques (see 


h to Outline an 
approach which extends the Thurstonian unidimensional 
paired comparison theory to inclusion of 


Stimuli dud 
directions to each subject Such that it i. reasonable 
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to represent his responses on a single dimensional scale. 


Such a domain might be in preferences among some class 
afternoon snacks. 


, it is reasonable 
ndicate his rank order of prefer- 


ences, He may have some aif: 
decisions between choices wh 
but the task to make rank order is reasonable. Thus, 
there is to be unidimensionality for each individual. 
Multidimensionality for a group of individuals can be 
introduced by using a vector for each person and allowing 
the vectors for many people to have various directions 
in the space. The stimuli may be represented by vectors 
in this same space. 

Consider Point 1 in the mathematical outline which 
follows on page 162. The vectors Xp represent various 

space are SO chosen 


people. The axes and scaling of this c 
that the variance of coordinates Xmp on each dimension m 
(1), and that the covariance 


is unity as per equation 

between each pair of dimensions is zero as per equation 

(2). Different choices of axes might be made whenever 

there is good reason to do 80: The Ay of Point 4 are 

vectors to represent the stimuli. The basic model is to 
iven stimulus 


ee each person's his vector and 
th uct between his 
y the scalar prod 5 are added so as to allow for 


vector. The eppt of Point 
random ines. M Equation (8) gives the basic scalar 
tne addition of the ppt: 


product formula with 

from Point O on is written 
for paired comp ence amore ia poy m 

and renc scale ues 
S m Qu) stulated that in paired com- 
parisons the per a pair of stimuli depends 
on the sign of the 
the sti jet: Equat 
vector structure. 
random fluctuation 
in the outline for 
greatly affect 


difference in 
ion (19) reletes the S(ni)p 
of equations relating to the 
They are included 


t do not 


s have 
ake of completeness bu 


he 8 
concern the mean scale 
the scalar products of the 

mean person vector. Thus this 


tors W 
stimuli vecta idimensional case for mean scale 


model degenerat 
values. 
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In order to 


Obtain some hold on the space of the 
Stimuli vectors 


e 
» the variances and covariances ^95] th 
S(hi)pt are considered as in equations (22) and (23). " 
re- 

The relative mean scale differences Y(ni) and the cor 


lations between pairs T(hi)(3k) bring use to VINER that 
may be estimated from data, In case of a en 
eople, the Y(ni) may be esti e 
sponding to the proportions O xi 
S i over stimulus h. The T(ni)(3k 


Y airs 
may be estimated by tetrachoric correlations between p 


of judgments. 


Six unique dimensions, one of which 
was unique to each stimulus. 


trast in liking for berries 


In Table 13-2 are the resu 
tions for the on Judgments, first by t 
Thurstone's Case ing our Y(ni). Note tha 
Using our Y(ni)- In fact; 
i-square test, which may 

Similar results have 
occurred on two previous experiments utilizing preferences 
for desserts. So far as T can d 


Studies involving 
these same stimuli. In both other Studi 
preference dimension provided the contrast between a 
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liki r 
ing for berries and a liking for melons; the scale 


er auite similar, and use of the variance from 
he erence structure in determining "mais yielded 
provement in goodness of fit to the observed proportions. 
dom Table 15-5 are results from 8n extension of the 

noon snacks experiment to choice of most preferred 


among triples of snacks. From the preference structure 
ft half of the table 


Bi a that the ratios on the le 
B m be less than those on the right. This is a form 
alt e voter's paradox where similarity between two 
ernatives ređuces the number of votes for each because 
of the splitting of votes for their general type (Thurs- 
tone, 1945). Since there are nine independent pairs of 
triples for which the inequality may be predicted and, in 
fact, there are many ways of sorting the 18 triples into 
sets of nine such pairs, very one such sorting 


e and since € 
yields all nine inequalities in the direction predicted, 
the chance of obtaining 


S tions from random data 
is less than one in 512. 


the observ& 
These results are highly sig- 
nificant. 

I wish to remark that the results for the triples can- 
not be predicted from any scaling system of which I know, 
which ignores individual aifferences- I wish to point 
out also, that these results indicate that Luce's (19598) 
axiom is violated in & domain involving systematic indi- 
vidual differences as in the present case. In this con- 
nection, and in all fairness, it should be noted that 

ndividual choice 


Luce intended his axiom to apply to i 
behavior and not; necessarily; to apply over groups of 


individuals. The present ata validates this point. 
In conclusion, it seems of extreme importance 


that theories and models be developed which encompass 
multidimensiona&li h for the perceptions and affective 
responses of each jaual and for individual differ- 


ences in these responses ` 
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Mathematical Outline Of Model for Intra-individual Uni- 
dimensionality and Inter-individual Multidimensionality 


1) Let the coordinates Xgp OP dimensions m of persons p 


2) 


3) 
4) 


5) 


6) 


be entries in matrix X with column vectors Ze and 
TOW vectors X. . 


Let the variances and covariances for rows of X be 
Var =1 (1) 
p ap) 
- (2 
Cover, (x o Mp? O form#M 


(By Ne is meant variance over p and by Covar. 


p 
ls meant covariance over p .) 


Let the row means of XD be X. and be entries in a 


colum vector X . 
Let the coordinates m for stimuli h on dimensions 
m be entries in matrix A with row vectors 


Let Chpt be a random variate such that: 


Eleny) =0. (3) 
(By E is meant expected value over Bi) 

on G= Q) 

mower, De ^ d, mpeg fr (5) 

Covar, (en pttap) 79, fond, (6) 

Cove (etg) =ð (7) 


Let the scale value, Sot? of stimuli h for persons 
p at times t be related to the x's, Si ara et, y 
the linear equation 


T) 


8) 


9) 


10) 


11) 
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AR, is the scalar, or d 


A, and X, 2) 


s or scale value Shp t 
from equations (5) 


ot product of vectors 


over persons can 


The mean 5, 
anà (8), 


h 
be shown to be, 


(9) 


Sn = A M 

Let coordinates for differenc 
and i in pairs (hi) be & (ni m 
stimuli coordinates by: 


es between stimuli h 
and be related to the 


= - 10 
B(ni)m ^ ^im Bro ? (10) 
= e 1l 
Agi) 7 Aa An? (11) 
where A (hi) is & row vector. 
The differences, © (ni )pt? are 
= - & . (12) 
° (ni)pt E;pt ^ “hpt 


in scale values of stimuli 


The differences, (mi jpt? 
h and i for persons P at times t are 
= -B * i 
B (ni)pt Sipt ~ "hpt (13) 


From equations (3)-CO: 
E (eni )pt? =03 (14) 
3 2 2 
Ver, (e (ni )pt! Er HES 5 for h + i; (15) 
DES (efni pt (epr =0, fort 71; (16) 
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Covar (e e ) =e? , 
P ` (hi)pt (hj )pt h 
fór 4 Á jn; (17) 
Covar ( (1 yop) =O, (18) 


12) From equations (8), (11), (12), and (13): 


“(hi ier 7 Afni) Xp + *(hi)pt * SS 


15) me mean, 3 (ni)? of 3 (hí ot Over p can be shown 


to be, from equations (14) ana (19): 
= 2 = 20) 
a) 7 ^X , ; 


and, from equation 03), 


Ehi) WE Dn Ge 


14) From equations (1), (2), (15), (16), (18), ana (19): 
2 
Var, (501 ype) 5 Ahi) (ni) + e +e 3 (22) 


"See, ni pet yr) T A (hi) A(gr) > 
fortZm. (25) 
15) Let 


Y(n1) = = 


3 (24) 
”(h1)(Jk) = xL ptt) ` . 
Fra 


P (ni ier "p SA nio 


16) In case the x and “npt 8re from a miltivaria 


normal distribution with means , variances 
3 
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variances as specified, then 
of a unit normal 


a 
) the Vn i) are deviates 
o the proportion, 


distribution corresponding t 


judgments i > h, where this corres- 


P (ni)? of j 
" is the area of the 


pondence is that Pin 
distribution to the left of Y (n1)! 

b) the T (n1) (3k) are the tetrachoric correlations 
between judgments i versus h, 


and k versus je 


STRUCTURE FOR AFTERNOON SNA 
ic Correlations Below Diagonal 
ic Correlations Above Diagonal 


ep -- ii 00 -05 06 


mensions 


coordinates of stimuli on Di 
Common 
Dimensions Unique pimensions Stimuli 
OE M AE E 


d E 6 
06 Strawberries with cream 
s 38 2 en 00 Blueberries with cream 
5 p 15 11 Red Raspberries with cream 
Bo 29 63 00 Chilled Watermelon 
36 00 Chilled Canteloupe Melon 
kp `Ð Chilled Honeydew Melon 


*pata based On resp 
students. 
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X = 2.90 
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P= .995 
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TABLE 13-3. RATIOS OF FREQUENCIES OF 
CHOICE OF STIMULI FROM TRIPLES OF STIMULI 


Comm 
PhuMenpens Third Member of Triples 


of Triples 


i 


he -form = 
Each ratio on the jeft is of the SI, By By 3M) 


F(B, [B 3M, Mp) 
and each ratio On the right is of the form p M, 55H, aM, . 


us berries and the M's for various 


tand for vario 


The B's s 
a nas f Iuce'S axiom would indicate equality of 
pplic& 
ratios in each uge predicts that in each row, the 
having a berry as the 


Preference struc iples 
ratios on the 165 Mappe than 
third member Me the third member. 


having & melo 


the ratios on the right 


1 4 Robert P. Abelson 
Yale University 


Scales Derived by Consideration 


of Variance Componants 


in Multi-Way Tables 


on of discriminant functions is to 


ing the memberships of several 


groups into relatively distinct clusters along the most 
e dimensions: Discriminant analy- 


ODORE respons 
S T may be expressed in terms of analysis of variance in 
ee way: Given several different measures on 
viduals within several groups, ® linear combination 
= the measures must be formed so 88 
etween-groups mean square relative t 
mean square. 
Although it is known that di is may 
is one kind of example, the 


en The usual applicati 
e problem of separat 


appreciated. Referen 
provides the key. 


relative to t 
of the analysis of yarıance, maxim 
or an interaction 


square relative to an interaction, 

relative to & within-ce are, and so on. Any 

ratio of mean Square th but one dependent vari- 

&ble would constitute F-ratio may be used 

as the basis for forming & pest-discriminating linear 
endent variables. This prin- 


combination of several dep! 
4llustrated in detail with the case of a 


factorial table. 


tead of maximizing 


This paper was presented while the author was on leave at th 
Center for Aavanced Study in the Behavioral Sciences. E 
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Consider a group of R raters each rating C concepts 
on S scales. The 


'concepts'! may themselves be indi- 
viduals and the Scales be a set of behavioral rating 
Scales, as in Sociometric or clinical ratings of the 
attributes of Subjects. Or, as with Osgood's "semantic 
differential," (Osgood, Suci, & Tannenbaum, 1957), the 
concepts may be words or other stimuli and the scales 
may be sets of bipolar adjectives. If, in either exam- 
Ple, a linear combination of the scales is formed, the 
three-way data table reduces to a two-way table: raters 
X concepts. Significance of variation among concepts is 
appropriately assessea by means of the ratio of the 
between-concept mean square to the concept x rater inter- 
action mean square, provided that the raters are a random 
Sample from a large population. Accordingly, the scales 


may be combined with the aim of maximizing this particu- 
lar variance ratio. 


the solution to the maximization 2) 
identical to the solution (Rao, 195 


considerable interest. For the three-way table, raters 
by concepts by scales, there are seven computable mean 
Squares: the three main effects, the three first-order 


ales? The traditional think- 
ing about the factor Problem with three 


been that information mist b 
one "way," forming correlat 
another "way," and ignoring 
(Cattell, 1952). In terms o 
variance, five of the seven 
ignored by such a method. 
tains more of the informati © expression for the 
mean square for concepts involves hi 
for concepts and, also, li 
by scales interaction; the mean g Sm > ee 
th or concepts by 
9 concepts by raters by 
variance compon- 
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ents are used. The three not used are raters, scales, 
and raters by scales. Since the analysis has been set 
up so as to discriminate between concepts, it is pre- 
cisely those variance components which do not involve 
concepts that disappear from the analysis. 

The whole trick, it would seem, in casting factorable 
three-way data into the discriminant form is to dis- 
tinguish in each given case the objects of discrimination, 
the agents of discrimination, and the modes of discrimina- 
tion. In the rater, concept, scale example, the concepts 
are the objects of discrimination, the raters are the 
agents and the scales are the modes. Let us turn briefly 
to some other examples. 

Suppose & clinician confronts each of his subjects 

tal conditions and records several 


with several experimen 
response measures. Imagine that the response measures 


are quantitative, with & common scale available for all 
Subjects. If any factor combinations are to be formed 
here, they would probably be among the responses. The 
responses would thus be the modes of discrimination, the 
Subjects would be the objects of discrimination and the 
conditions the agents. The ratio to be maximized would 
be the between Subject mean square relative to the sub- 
ject by conditions interaction. This example casts light 
on the role of the agents of discrimination. The investi- 
g&tor usually hopes that the discriminations between ob- 
jects are extremely stable over the agents, but sometimes 
he is very unsure that this will be the case. For cer- 
tain personality theories the presence of strong subject- 
condition interactions is anathema. It is appropriate 
for this interaction to serve as the error term in the 


discriminant ratio. 
Let us examine the situation for the experimentalist. 


Suppose he were to run a number of subjects in each of 
several experimental conditions and obtain several quan- 
titative response measures for each. The responses would 
again serve as modes, but the agents and objects would be 
reversed from the way the clinician had them. For the 
experimentalist a experimental conditions are the objects 
of discrimination and subjects are the agents.  Between- 
conditions mean square wants to be maximized relative to 
Subject-condition interaction. Again, large subject- 
condition interaction is & nuisance. An excellent dis- 
cussion of the differences in the emphases of clinician 


and experimenter has been given by Cronbacn (1957). 
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The experimental example has an interesting sidelight. 
Suppose the experimenter were interested in whether 
thirst was a well-behaved intervening variable, a unitary 
drive. His experimental conditions might involve several 


he existence of thirst. The test 
irst as an intervening variable 
of the sufficiency of a single 


a three-way design, consider & 
dministration, replicated on one 
or more occasions on the Same subjects. The objects of 
subjects, the modes are the tests, 
and the occasions are the agents. The ratio to be maxi- 
-subject mean square relative to the 
Subject by occasions interaction. What is the meaning 


which error variances &re unknown 
estimated. I am thinking of Rao! 
particular. If, however, Occasions exerted an active 
differential influence 


tests, but also error 
In this case, a discriminant 
solution would reveal effect 


Components 
Subject by condition effects. It is 
in this context to Speak of correlated errors. 

These and other various applications of the discrimi- 
nant idea to three-way tables are Summarized in Table 
14-1. Applications to more complicated q, 


esigns have been 
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omitted. From an examination of the table » the roles of 
agents, objects, and modes of discrimination may be re- 
stated as follows: Objects of discrimination are the 
things which you woulà be Very disappointed if there were 
no differences between, modes are the manifestations of 
differences between Objects, and agents are the things 
that mold or transform the differences. Modes z so to 
Speak, are in the foreground, while agents are in the 
background. The discriminant solution requires corre- 
covariancing, and it is the modes 

ed with one another. Interactions 
gents serve as the explicit basis 
One wants differences between ob- 
respect to agents. Interactions 
5, on the other hand, serve as 
dentifying the structure of the 
ms (in common parlance, the factor 


Computational Formulae 
——— Br. Formulae 


KÉ = the rating given by rater 1 (121,9,...r) 


to concept j Ubau on scale k 
(1,2, ...s) 


w 
I 


— the weight assigned to scale k. 
(1) x - 3 WX jk = the rating by rater i of con- 


cept J on a linear combination 
of the Scales, 


(SS), nop = total sum of Squares (of the y J- 
(ss) = between rater sum of Squares 

(ss), = between concept sum of Squares. 
($8) io = rater 


-1) = between Concept mea, 


n square. 
(3 Di = Los), /(x-1) (0-1) = 


interaction mean square. 


k 
the ratio 
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From standard analysis of variance formulas (a dot indi- 
cating an average over & subscript), 


2 2 
(5) (pl = 2 = Yig -re Y.. 
i j 
6 2 2 
(60 (ss), ett, re 
DUE 
Di (ss), --Z me ETT MES 
J 


(p fiu. (88) 595 ` (ss), - (ss), - 


Substituting (1) in (5), (6), and (7) and simplifying, 


we find (using k' as an alternative subscript for scales 


Da (hor 2 SM tet Pkt? 


k k' 
where 
D ta. = A : Xj kage! 7 roX vis? 
Dip (ex 27,” Sache? 
where 
tel sao Fri nur ae E 
(13) (ss) SÆÐ Við Mic! , 
c k ki 
where 
= zs. XE = 
Dh) fy: T " Ka nen 008 acie ey 
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(5) (ss) CERE Me Bact Se s 


(16) Bot = ty — Buer = Toye 


The quantities tyg, Bkk', fkk', and gķķ' in (10), 


(12), (14), ana (16) are computable (laboriously) from 
the original X... . They are the breakdowns of sums of 
SC? dom of 
products betweeh Scales, analogous to the breakdown SS 
sums of squares for & composite scale into total, betwe 
raters, between concepts, and interaction, respectively- 


Converting Den and &kk' to mean products by dividing 
by degrees of freedom: 


kk' 7 fac /(e-1) D 


(18) gan = Bann tede) 


(19) Dei, = 


and the ratio Q to be maximized ig the ratio of the two 
quadratic forms, (19) ana (20). 
In matrix notation, 


2 WBw' 
(21) an 


where w is a lxs row Vector of Veights 
the sxs matrices of bygt and ey, 2 

The maximization solut 
first characteristic Vector of the matrix (E-1/2 1/2). 
The corresponding maximum value Of Q is the lar ERE 
acteristic root. Other characteristic Nee bare ees va 


and B and E are 


ion for w i 
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first may also contribute to discrimination among Con- 
SE as in the standard application of discriminant 
unctions (Bock, 1956b; Rulon, 1951; Slater, 1956). 

In the foregoing, we have followed the traditional 
thinking about discriminants, devolving on a ratio of 
two mean squares. Modern thinking would lead us to con- 
sider variance components, rather than mean squares; this 
is done in the next section. 

at Is Really Being Maximized 


The Underlying Model, or, Wh 
and Tukey (1956), we assume the 


` Following Cornfield 
pigeonhole model" which makes & minimum of restrictive 
assumptions. Imagine & population of R raters and C con- 


cepts, from which r raters and C concepts are sampled 
independently and at random. Each rater (hypothetically, 
in the population) rates each concept (in the population) 
on each of s fixed scales, yielding & three-way table of 
values Oygg. The measurement or replication errors 
eege with each of the observations ( "pigeonholes") 
are &ssumed to be independent, though the variances of 


the errors need not be homogeneous - 
The azz, in the population are decomposable into eight 


components (a dash indicates a missing subscript): 


e tye Lt BE g t uae age 
+ Vp * Vere * Vrae * ^03 P 

where 
nl m the general mean, 
Vip Fa main effect for rater I (I2,2;...R); 
Vii the main effect for concept J (J=1,2,.--C), 
qux em the main effect for scale k (igaya) 
V. = interaction effect between rater I and 


IJ- concept J» 


M i interaction effect between rater I and 
I-k scale K; 


178 Psychological Scaling 


Vk = interaction effect between concept J and 
scale k, 


IJk ` Second-order interaction among rater I, 
concept J, scale k, 


€ 7 replication error associated with pigeonhole 
IJk Lk 


s 
Convenience it is assumed that each of the effect 


For 
V sums in the Population to zero over any index included 
in its subscript. 


The model for the observed values X, 


ijk is 
(23) Xi =Ý +V, + GE rb Vis. 
+ Ma de * am eo Va gx vie d 


ijk by equa- 
(17), ana (18). substi- 
eads to expres- 


kk' and oe) in terms of the components, V. 


Details will not be given, 


concepts.) 


(24) Eo) =x [C02 + 


(25) Bley) = [t 


k 
where 
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a 
ll 
M 
< 


Y #2 1 
xke "ET E Yo Pane? 
[^J] 
1 
bat 
RC = = EX V. 
R-1)(C-1) iz Y 
B = ZE Vr V. 
k RD) © TJ 7 
z x ZE Van 
kk' R-1)(C-1 IJk IJk' ” 
2 T 
d =— 55 Ele ) 
e, RC IJ IJk 


(Expectation is here defined over replications.) 


= O otherwise. 


EE , 


Formula (24) displays three types of components : 


systematic variation between concepts, alone or in inter- 
action with scales; 'quasi-systematic" interaction between 
concepts and raters, alone or in double interaction with 
scales; and error variation. Formule (25) displays only 
the latter two types- The middle term in (2h) can only 
be commensurate with the first term in (25) if we con- 
sider R > ii i.e., the population of raters as ex- 
tremely large relative tO the sample of raters. This is 
the usual case» 

The expected yalues of the quadratic forms (19) and 
(20) become 
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3 2 2 S ) 
Kk er) = At Wien x E KS «JG WP x 


+ B(Zw E opt es iae 
p oU mec 
EA a . 
k Ek 
2 2 roo) 
(27) ECT We Aes N kk 
SR 


Can become large if the 
the By, but 


© value of the sum 
xe WkYkk'Wk'; or if the Wk Correlate Poorly with the 
2 
o ete. 
€ 2 


(28) D=B-E 


2 


(29) Blaar kese so 


Then one may maximize simply the 
to the sum of squares of the wei 


EE Ba) 3 


der ity WDw' (relative 


asi-systematic" 
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components have been eliminated from the expected value, 
and only systematic components remain. In essence, this 
solution reduces to the factorization of the matrix D. 

With either the ratio method or the difference method, 
there are two problems which have not been proached here: 
What to do about communalities (an identical set of 
arguments pro and con apply here as in standard factor 
analysis), and how to test for significance of factors. 
Presumably the standard significance tests of multi- 
variate dispersion analysis (Anderson, 1958) are not 
applicable with the general pigeonhole model. 

It is interesting to note that the method Osgood, 
Suci, and Tannenbaum (1957) reported for factoring three- 
way semantic differential judgments is equivalent to 
factoring the matrix T with entries tyy! [equation (10)]. 
If it is charitably assumed that all &yy: are negligible, 


then 


(30) T= (e1) B + (r-1)(c-1) E > 


(31) E(t - see + By + Der + Ya) 


‘kk! ) 
2 tin + XT u 
+ (c tl, BUS GH e, Ses 
What is being maximized (provided the diagonals of T are 
not used as they stand, for they are inflated by error) 
matic and quasi-systematic yari- 


is the sum of all syste 
ance components involving concepts. The procedure results 
in more factors than the other procedures, as the compon- 


ents of variance are not separated out. 
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response pattern 0, 85-86 
rubber-scale phenomenon, 
81 


scales, 
ability underlying the 
test, 97-98 
absolute rating; 68 
bisection, 52 
category, 17; 52-53, 92» 
57, 59 
SES rating, 9, 21- 
22 
i viii, 49-50, 
52-55 
equal-interval, 
hedonic, l 
ape i 145-149, 152-153 
interval; 55, 120 
ind, 55, 22: 5T 
joint, 1 R 
Togarithmie interval, 


i 
Aere: 55, 65 


8, 15, 18 
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scales (contd) 
magnitude ratio, 57 
metathetic, 39, 49, 52- 
55, DT 
Munsell, 55 
observed-score, 98 
ordered metric, 130 
ordinal, 135-136 
partition, 49, 52 
physical, 25-2 
prothetic, 39, 49-53, 
55, 57 
rating, 21-22 
ratio, viii, 49, 55-55, 
57, 62 
response, ix 
subjective, 51, 59-60 
subjective evaluation, 81 
successive categories, 
viii, 7-8, 11, 15-15, 
17, 19, 50 
successive intervals, 12, 
17 
true score, 98 
utility, 119, 123-124 
score, 
ability, 97-98 
observed, 98-99, 101-107 
true score, 98-107 
semantic differential, 170 
method for factoring, 181 
sensory magnitude, vii, ix, 
5h, 65 
SEU (subjectively expected 
utility) model, ix, 
13345 115-0555 Is 122- 
123, 126-127 
maximization, 114, 117, 
192, 127 
SEV (subjectively expected 
value) model, 111, 125 
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Subject Index 


Significance tests of multi. 
variate dispersion 
analysis, 181 

of factors, 181 

similarity factor, 34-35 

Social utility, 147-148, 
150-151, 153 

SP (subjective probability), 
111-120, 123-12h, 106. 
127, 130-131 

addition theorem for, 120 
concept, 109 


curve, 117 

functions, 112, 122 

in ASEU ana NASEU models, 
113 


measures, 115, 122 
Properties, 125 
Scaling, 125 
transformation, 122 
values, hon-additive, 

118-119 
Stevens ' Beneral psycho- 
physical law, 50 
Stimulus magnitude, 50, 55 
Mesi ne evaluation Scale, 
l 
subjective magnitude, 
21-52, 5h, 57 
subjective ratio, 65 
successive categories, 
method of, viii, 7-8, 


11, 15-15, 17-19, 50 
model, 45 


successive categories 
Scales, viii, 7-8 
13-15, 17, 19, 50 

successive intervals, 
method of, viii, 7-8, 


hg, 


pd 


11, 15-15, 17-19, 50 
Successive intervals Scales, 
185 17 


Systematic bias, 55 
systematic power, 139 


test theory, 83-85, 156 
tetrads, method of, 4 
Thurstone's Case V, 50-52, 
160 
Thurstone's criterion for 
zero point, 77 . 
Thurstone's judgment scaling 
model, 157 : 
discriminal dispersions, 
157 . 
frequency distribution 
of perceptual or 
affective values, 157 
multidimensional exten- 
sions, 157-158 
normal distribution, 157 
paired comparisons, 159- 
160 . 
Thurstone's psychophysical 
equation, 51 " 
traceline (or operating wo 
acteristic), 85-87, 9 
91, 93-94 
of test items, 97 
normal-ogive, 97 
transformation groups, Í 
interval scale, 112-114, 
126 
identity scale, 112-114 6 
ratio scale, 112-114, 12 
triadie Judgment, 157 
true score, 83-84 " 
true score distribution, 104- 
105, 107 
moments of, 105 
Tucker's method of congru- 
ence, 150 


unfolding technique, Coombs', 
129, 146-154, 158 
or Preferential choice 
behavior, 145 


multidimensional, ix, 146- 
154 


Subject Index 


uniform variability, ad- 
. justments for, 77 
utility, 115, 118-119, 125, 
126-127 
curve, 117, 119-120 
defined, 109-110 
directed distances, 132 
function, 117, 119, 122, 
124, 130-132 
intervals, 132 
measurement of, viii, 
ix, 122, 130 
properties, 125 
scale, 119, 123-124 
social, 147-148, 150-151, 
155 
values, 132 
zero, 119-120 
utility (or subjective) 
value concept, 109 
subjective value, 110 
subjective value scale, 


120 


variability, 
in measurement, 65 
observer's modulus, 65 
observer's subjective 


ratio, = 
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variability (contd) 
sense-organ character- 
istics, 
variance, analyses of, 169- 
172 
variance components , 170- 
171, 177 
error variation, 179 
measurement of replica- 
tion errors, 177 
pigeonhole model, 177 
quasi-systematic, 179-181 
systematic, 179 181 
variance preferences, 120, 
125-127 
vectors; 
aifference, 159 
individual, 159 
space of stimulus, 
stimulus, 159 
structure, 159-160 
vector model for multidimen- 
sional scaling (also 
inter-individual multi- 
dimensionality), ix, 


160 


158-165 

Weber's law, 50, 52 

WSEU (weighted SEU) model, 
123, 125 


